Systematic Software Testing Techniques: Combinatorial Testing Dr. Renée Bryce Professor University of North Texas Renee.Bryce@unt.edu
Presentation outline Introductions Motivation Background on combinatorial testing Examples, definitions, fun facts, and an example algorithm Try it! Create your own Pairwise Combinatorial Test Suite on paper Sneak peek at the ACTS tool (You’ll use this tool when Professor Wong returns next week.) BONUS QUESTION! Do you have an examples of where you could apply CT? Prioritized Combinatorial Testing Example algorithm Example applications Test Suite Reduction Definitions and examples Try it! Reduce a test suite using the HGS algorithm.
Introductions Briefly share your experiences with Software Testing
Motivation Costs of software defects Software defects cost ~$59 billion per year [1] One contributor to software defects Many system components are tested individually, but often unexpected interactions between components cause failures. [1] National Institute of Standards and Technology, The Economic Impacts of Inadequate Infrastructure for Software Testing, U.S. Department of Commerce, May 2002.
Combinatorial TestING Example Hardware Operating System Network Connection Memory PC Windows XP Dial-up 64MB Laptop Linux DSL 128MB PDA FreeBSD Cable 256MB Four factors (components) have three levels (options) each Sample test Test No. Hardware Operating System Network Connection Memory 1 PC Windows XP Dial-up 64MB Pairs covered 1. (PC, Windows XP) 4. (Windows XP, Dial-up) 2. (PC, Dial-up) 5. (Windows XP, 64MB) 3. (PC, 64MB) 6. (Dial-up, 64MB)
Combinatorial Testing Example Hardware Operating System Network Connection Memory PC Windows XP Dial-up 64MB Laptop Linux DSL 128MB PDA FreeBSD Cable 256MB Four factors (components) have three levels (options) each Test No. Hardware Operating System Network Connection Memory 1 PC Windows XP Dial-up 64MB 2 Laptop FreeBSD DSL 64MB 3 PDA Linux Cable 64MB 4 Laptop Windows XP Cable 128MB 5 Laptop Linux Dial-up 256MB 6 PC Linux DSL 128MB 7 PDA Windows XP DSL 256MB 8 PC FreeBSD Cable 256MB 9 PDA FreeBSD Dial-up 128MB
Fun Facts Combinatorial testing has been used in several fields: Agriculture Combinatorial chemistry Genomics Software/hardware testing Study of Mozilla web browser found 70% of defects with 2-way coverage; ~90% with 3-way; and 95% with 4-way. [Kuhn et. al., 2002] Combinatorial testing of 109 software-controlled medical devices recalled by US FDA uncovered 97% of flaws with 2-way coverage; and only 3 required higher than 2. [Kuhn et. al., 2004]
Covering arrays A covering array, , is an N x k array. In every N x t subarray, each t-tuple occurs at least λ times. In our application, t is the strength of the coverage of interactions, k is the number of components (factors), and v is the number of options for each component (levels). In all of our discussions, we treat only the case when λ = 1, (i.e. that every t-tuple must be covered at least once). Behind the scenes this combinatorial object is constructed to represent interaction test suites No efficient exact method is known Mathematicians and Computer Scientists have offered solutions from different view points. Their solutions have been measured by time to generate test suites and sizes of test suites.
Exercise 1 1. List all of the 2way combinations (pairs) for this input: Hint: Example pairs are (0,3) (0,4) (0,5)…. (8,11) 2. Create a combinatorial test suite for the input above. No credit will be given for an exhaustive test suite.
Perceived benefits of greedy algorithms Mathematical Greedy Search Size of test suites Accurate on special cases; but not as general as needed Reasonably accurate Most accurate (if given enough time) Time to generate tests Yes Often time consuming (for good results) Seeding/ Constraints Difficult to accommodate seeds/constraints
History of One-test-at-a-time Greedy Algorithms Pros First tool to generate test suites (based on covering arrays) Cons Produces different test suites to the same inputs Slow Deterministic Particularly good for mixed-level inputs Faster Overly large test suites for fixed-level inputs Competitive results Logarithmic guarantee on the size of test suites AETG TCG DDA
Sample sizes of test suites [1] R. Bryce, C.J. Colbourn. A Density-Based Greedy Algorithm for Higher Strength Covering Arrays, Journal of Software Testing, Verification, and Reliability, (March 2009), 19(1):37-53. [2] R. Bryce, C.J. Colbourn. The Density Algorithm for Pairwise Interaction Testing, Journal of Software Testing, Verification and Reliability, (August 2007), 17(3): 159-182. *(Citeseer impact ranking of STVR: .36)
Framework of One-row-at-a-time Greedy Methods Defines commonalities that all “one-row-at-a- time” greedy algorithms have in common A process provides statistical feedback on the impact of different decisions that can be made in the framework Experiments explore several thousand instantiations of the framework and provide a requisite of knowledge
Framework for greedy methods Layer one: test suite repetitions Layer two: multiple candidates Layer three: factor ordering* Layer four: level selection* Pairs left to cover: Test No. Hardware Operating System Network Connection Memory 48 1 PC Windows XP Dial-up 64MB 42 2 PC Linux DSL 128MB 36 3 PC FreeBSD Cable 256MB 4 ? ? ? ?
Framework for greedy methods Layer one: test suite repetitions Layer two: multiple candidates Layer three: factor ordering* Layer four: level selection* Factor 1 ? PC Laptop PDA Involved in 9 uncovered pairs Involved in 9 uncovered pairs Involved in no uncovered pairs
Framework for greedy methods Layer one: test suite repetitions Layer two: multiple candidates Layer three: factor ordering* Layer four: level selection* Laptop ? PC Laptop PDA Involved in 9 uncovered pairs Involved in 9 uncovered pairs Involved in no uncovered pairs
Framework for greedy methods Layer one: test suite repetitions Layer two: multiple candidates Layer three: factor ordering* Layer four: level selection* Laptop ? Factor 3 Dial-up DSL Cable Will cover 1 new pair Will cover 1 new pair Will cover 1 new pair
Framework for greedy methods Layer one: test suite repetitions Layer two: multiple candidates Layer three: factor ordering* Layer four: level selection* Laptop ? DSL Dial-up DSL Cable Will cover 1 new pair Will cover 1 new pair Will cover 1 new pair
Framework for greedy methods Layer one: test suite repetitions Layer two: multiple candidates Layer three: factor ordering* Layer four: level selection* Laptop Factor 2 DSL ? WinXP Linux FreeBSD Will cover 2 new pair Will cover 1 new pair Will cover 2 new pairs
Framework for greedy methods Layer one: test suite repetitions Layer two: multiple candidates Layer three: factor ordering* Layer four: level selection* Laptop WinXP DSL ? WinXP Linux FreeBSD Will cover 2 new pair Will cover 1 new pair Will cover 2 new pairs
Framework for greedy methods Layer one: test suite repetitions Layer two: multiple candidates Layer three: factor ordering* Layer four: level selection* Laptop WinXP DSL Factor 4 64MB 128MB 256MB Will cover 2 new pair Will cover 2 new pair Will cover 3 new pairs
Framework for greedy methods Layer one: test suite repetitions Layer two: multiple candidates Layer three: factor ordering* Layer four: level selection* Laptop WinXP DSL 64MB 64MB 128MB 256MB Will cover 2 new pair Will cover 2 new pair Will cover 3 new pairs
Framework for greedy methods Layer one: test suite repetitions Layer two: multiple candidates* Layer three: factor ordering Layer four: level selection Number of newly covered pairs 3 candidate rows PDA Linux Cable 128MB 6 5 PDA Win XP Cable 64MB 5 PDA Linux Cable 128MB
Framework for greedy methods Layer one: test suite repetitions* Layer two: multiple candidates Layer three: factor ordering Layer four: level selection Test suite A Test suite B Test suite C Select the smallest test suite generated.
Framework experiment - ANOVA results for several inputs [1] R.Bryce, C.J. Colbourn, M.B. Cohen. A Framework of Greedy Methods for Constructing Interaction Tests. The 27th International Conference on Software Engineering (ICSE), St. Louis, Missouri. (May 2005), pp. 146-155. (13% acceptance rate)
Sneak Peek at ACTS Professor Wong will cover this next week, but here is a quick glance. Learn more at: http://csrc.nist.gov/groups/SNS/acts/download/ Download it at**: http://www.cse.unt.edu/~reneebryce/ACTS ** If you use ACTS at your company, please email NIST for permission first d.kuhn@nist.gov
Test Suite Prioritization and Reduction by Combinational-based Criteria Dr. Renée Bryce Associate Professor University of North Texas Renee.Bryce@unt.edu
Prioritized combinatorial testing What if parts of a system are more important to test earlier? What if a tester learns during testing and wants to regenerate a test suite with new priorities? What if a tester has time to run pair-wise coverage and time to run some three-way tests?
A variation of the covering array An ℓ-biased covering array is a covering array CA(N; 2, k, v) in which the first rows form tests whose total benefit is as large as possible. That is, no CA(N; 2, k, v) has rows that provide larger total benefit. Input 3 factors with varying numbers of associated levels (options) and weights assigned to each level
Algorithm Walkthrough Input 3 factors with varying numbers of associated levels (options) and weights assigned to each level Larger weight means higher priority should be given to testing earlier! Step 1 – Calculate Factor Interaction Weights. Factors will be assigned values in order of “highest priority”
Algorithm Walkthrough Input Input has been processed Factor interaction weights have been calculated (to determine order to assign level values to factors). Step 2 – Calculate Factor-Level Interaction Weights to select the level that covers the most uncovered weighted density. For a factor, i, and a level, ℓ, that number of levels for a factor is called , and the factor interaction weight is called Factor-Level Interaction Weight = Remember, the factor interaction weight was 1.3 for factor 2.
Algorithm Walkthrough For a factor, i, and a level, ℓ, that number of levels for a factor is called , and the factor interaction weight is called Factor-Level Interaction Weight = Step 2 (continued) - Calculate Factor-Level Interaction Weights to select the level that covers the most uncovered weighted density. Weight between two levels divided by max factor-interaction weight Weight between two levels divided by max factor-interaction weight + Weight of level multiplied by weight of level (of fixed factor) Weights of levels between each multiplied
Sample Results Input Output [1] R. Bryce, C.J. Colbourn. Prioritized Interaction Testing for Pairwise Coverage with Seeding and Avoids, Information and Software Technology Journal (IST, Elsevier), (October 2006), 48(10):960-970.
Test Suite Prioritization Problem: Given T, a test suite, Π, the set of all test suites obtained by permuting the tests of T, and f, a function from Π to the set of real numbers, the problem is to find π∈Π such that ∀π′ ∈Π,f(π)≥f(π′). In this definition, Π refers to the possible prioritizations of T and f is a function applied to evaluate the orderings. Convert the web logs to a user-session-based test suite. POST/GET requests The test suite is large!
Case Study: Prioritizing User-session-based Test Suites Methodology: Convert web logs to user-session-based test suites, prioritize, and write to an XML format. Algorithm: Efficiently prioritize by combinatorial-based coverage for large test suites Empirical Studies: Families of empirical studies to analyze the effectiveness in relation to characteristics of the applications and test suites.
Research Questions Can we improve the rate of fault detection for user-session-based testing with new prioritization criteria? Which techniques are valuable in different scenarios? i.e.: tests have a high/low Fault Detection Density i.e.: predicted distribution of faults (deemed from prior versions of the software) Can we fine tune the criteria? i.e.: cost-based prioritization
Prioritization Metrics Test length based on number of base requests: order by the number of HTTP requests in a test case Frequency-based prioritization: order such that test cases that cover most frequently accessed pages/sequence of pages are selected for execution before test cases that exercise the less frequently accessed pages/sequences of pages. Unique coverage of parameter-values: order tests to cover all unique parameter-values as soon as possible 2-way parameter-value interaction coverage: order tests to cover all pair-wise combinations of parameter-values between pages as soon as possible Test length based on number of parameter-value: order by number of parameter-values used in a test case Random: randomly permute the order of tests
Empirical Studies TerpCalc, TerpPaint, Terp Spreadsheet, and TerpWord Online Bookstore Online Course Project Manager (CPM) Online Conference Management System SchoolMate Online Music Store Metavist (sponsored by USDA)
Results for an on-line system for a Course Project Manager and 890 Test Cases [1] R. Bryce, S. Sampath, A. Memon. Developing a Single Model and Test Prioritization Strategies for Event-Driven Software, Transactions on Software Engineering, (January 2011), 37(1):48-64.
Sample results % of test suite run Most frequent requests No. of Requests Long to short Short to long PVs Long to Short 1-way 2-way Random 10 85.28 78.17 75.14 83.53 16.38 83.79 83.72 48.63 20 88.52 80.34 77.76 88.77 25.6 87.78 90.8 57.55 30 89.4 81.77 80.27 26.44 91.54 91.72 64.51 40 89.86 84.58 81.39 92.71 28.76 94.79 95.64 69.19 50 91.04 85.58 82.95 30.33 73.03 60 91.58 87.14 84.44 94.26 34.64 75.37 70 92.1 87.74 85.15 39.15 77.37 80 92.35 88.27 86.21 39.58 78.24 90 92.37 88.3 86.31 42.18 94.99 78.45 100 92.45 88.36 86.35 43.09 78.49
Test prioritization by interaction coverage Test suite prioritization GUI-based testing
Empirical Studies Traffic Collision Avoidance System GUI-based Testing Word processor Spreadsheet Paint Calculator Web application Testing Bookstore Course Project Manager Conference Management Software
Transfer of Work Potential users that have contacted NIST to use our tool: AT&T BBC (for Winter Olympics website) Booz Allen Hamilton Angel.com U.S. Army Test and Evaluation Research Laboratory, Aberdeen Proving Ground A2Z Research and Development NASA IV&V
Transfer of Work [1] S. Sampath, R. Bryce, S. Jain, S. Manchester. A Tool for Combinatorial-based Prioritization and Reduction of User-Session-Based Test Suites, International Conference on Software Maintenance (ICSM) - Tool Demonstration Track, Williamsburg, VA, September 2011
Test Suite Reduction Problem: Given T, a test suite with test cases { }, a set of testing requirements,{ }, that must be satisfied to provide the desired test coverage of the program, and subsets { } of T, one associated with each of the s such that any one of the tests belonging to Ti satisfies . Find the minimal cardinality subset of T that exercises all of the requirements exercised by the original test suite T. Original Test Suite (Too large for our budget) Reduced Test Suite (Fits into budget) 45
Reduction Example Original Test Suite {t1,t2,t3,t4} Requirements covered by the test suite {r1,r2,r3,r4} Problem: Reduce the test suite such that it maintains coverage of these requirements
Test Suite Reduction Example Requirement Ti 1 {t3,t4} 2 {t4} 3 {t1, t2, t3, t5} 4 {t1, t2, t3} In this example, there are three possible solutions. We highlighted 1: {t1, t4}
Test Suite Reduction Example HGS Algorithm Select t5 since it is of cardinality 1 2. Consider unmarked Tis of cardinality 2, that is T4, T5, T6. Select the test that appears in the most Tis. That is a tie between t1 and t6. Break the tie by examining sets of cardinality (m+1), That is sets T3 and T7. Break tie between t1 and t6, by selecting t1 as it is in T3. T Requirement Ti 1 {t1, t5} 2 {t5} 3 {t1, t2, t3} 4 {t3, t6} 5 {t1, t4} 6 {t1, t6} 7 {t3, t4, t7} 8 {t2, t3, t4, t7} 3. T4 is of cardinality 2, there is a tie between t3 and t6, so we look at sets of size cardinality (m+1). We choose t3. Reduced Test Suite: {t5, t1, t3}
Exercise T Requirement Ti 1 {t1, t5} 2 {t5} 3 {t1, t2, t3} 4 {t3, t6} Reduce this test suite using the HGS algorithm: T Requirement Ti 1 {t1, t5} 2 {t5} 3 {t1, t2, t3} 4 {t3, t6} 5 {t1, t4} 6 {t1, t6} 7 {t3, t4, t7} 8 {t2, t3, t4, t7}
Test Suite Reduction Example HGS Algorithm Select t5 since it is of cardinality 1 2. Consider unmarked Tis of cardinality 2, that is T4, T5, T6. Select the test that appears in the most Tis. That is a tie between t1 and t6. Break the tie by examining sets of cardinality (m+1), That is sets T3 and T7. Break tie between t1 and t6, by selecting t1 as it is in T3. T Requirement Ti 1 {t1, t5} 2 {t5} 3 {t1, t2, t3} 4 {t3, t6} 5 {t1, t4} 6 {t1, t6} 7 {t3, t4, t7} 8 {t2, t3, t4, t7} 3. T4 is of cardinality 2, there is a tie between t3 and t6, so we look at sets of size cardinality (m+1). We choose t3. Reduced Test Suite: {t5, t1, t3}