UCITA AND CONSUMERS An Overview of High Volume Test Automation (Early Draft: Feb 24, 2012) Cem Kaner, J.D., Ph.D. Professor of Software Engineering Florida.

UCITA AND CONSUMERS An Overview of High Volume Test Automation (Early Draft: Feb 24, 2012) Cem Kaner, J.D., Ph.D. Professor of Software Engineering Florida Institute of Technology Acknowledgments: Many of the ideas presented here were developed in collaboration with Douglas Hoffman. These notes are partially based on research that was supported by NSF Grant CCLI “Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing.” Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Copyright (c) Cem Kaner, 1999

Abstract This talk is an introduction to the start of a research program. Drs. Bond, Gallagher and I have some experience with high volume test automation but we haven't done formal, funded research in the area. We've decided to explore it in more detail, with the expectation of supervising research students. We think this will be an excellent foundation for future employment in industry or university. If you're interested, you should talk with us. Most discussions of automated software testing focus on automated regression testing. Regression tests rerun tests that have been run before. This type of testing makes sense for testing the manufacturing of physical objects, but it is wasteful for software. Automating regression tests *might* make them cheaper (if the test maintenance costs are low enough, which they often are not) but if a test doesn't have much value to begin with, how much should we be willing to spend to make it easier to reuse it? Suppose we decided to break away from the regression testing tradition and use our technology to create a steady stream on new tests instead. What would that look like? What would our goals be? What should we expect to achieve? This is not yet funded research--we are still planning our initial grant proposals. We might not get funded, and if we do, we probably won't get anything for at least a year. So, if you're interested in working with us, you should expect to support yourself (e.g. via GSA) for at least a year and maybe longer.

Typical Testing Tasks Assess the tests Execute the tests
Debug the tests Polish their design Evaluate any bugs found by them Execute the tests Troubleshoot failures Report bugs Identify broken tests Document the tests What test ideas or spec items does each test cover? What algorithms generated the tests? What oracles are relevant? Maintain the tests Recreate broken tests Redocument revised tests Manage test environment Set up test lab Select / use hardware/software configurations Manage test tools Keep archival records What tests have we run What collections / suites provide what coverage Analyze product & its risks Benefits & features Risks in use Market expectations Interaction with external S/W Diversity / stability of platforms Extent of prior testing Assess source code Develop testing strategy Pick key techniques Prioritize testing foci Design tests Select key test ideas Create tests for each idea Design oracles Mechanisms for determining whether the program passed or failed a test

Regression testing This is the most commonly discussed approach to automated testing: Create a test case Run it and inspect the output If the program fails, report a bug and try again later If the program passes the test, save the resulting outputs In future testing: Run the program Compare the output to the saved results. Report an exception whenever the current output and the saved output don’t match.

Really? This is automation?
Analyze product & its risks -- Human Develop testing strategy -- Human Design tests Human Design oracles Human Run each test the first time -- Human Assess the tests Human Save the code Human Save the results for comparison -- Human Document the tests Human (Re-)Execute the tests Computer Evaluate the results Computer + Human Maintain the tests Human Manage test environment -- Human Keep archival records Human

This is computer-assisted testing, not automated testing.
ALL testing is computer-assisted.

Other computer-assistance…
UCITA AND CONSUMERS Other computer-assistance… Tools to help create tests Tools to sort, summarize or evaluate test output or test results Tools (simulators) to help us predict results Tools to build models (e.g. state models) of the software, from which we can build tests and evaluate / interpret results Tools to vary inputs, generating a large number of similar (but not the same) tests on the same theme, at minimal cost for the variation Tools to capture test output in ways that make test result replication easier Tools to expose the API to the non-programmer subject matter expert, improving the maintainability of SME-designed tests Support tools for parafunctional tests (usability, performance, etc.) Copyright (c) Cem Kaner, 1999

Don't think "automated or not"
Think continuum: more to less Not, "can we automate" Instead: "can we automate more?"

A hypothetical System conversion (e.g. Filemaker application to SQL)
Database application, 100 types of transactions, extensively specified (we know the fields involved in each transaction, know their characteristics via data dictionary) 15000 regression tests Should we assess the new system by making it pass the regression tests? Maybe to start, but what about… Create a test generator to create high volumes of data combinations for each transaction. THEN: Randomize the order of transactions to check for interactions that lead to intermittent failures This lets us learn things we don’t know, and ask / answer questions we don’t know how to study in other ways

Suppose you decided to never run another regression test
Suppose you decided to never run another regression test. What kind of automation could you do?

Long-Sequence Regression
Fuzzing Sampling system Long-Sequence Regression Oracles Model Reference Diagnostic Constraint Inputs Input filters Function Consequences Output filters Combinations Task sequences File contents Input / reference / config State transitions Execution environment

Issues that Drive Design of Test Automation
Theory of error What kinds of errors do we hope to expose? Input data How will we select and generate input data and conditions? Sequential dependence Should tests be independent? If not, what info should persist or drive sequence from test N to N+1? Execution How well are test suites run, especially in case of individual test failures? Output data Observe which outputs, and what dimensions of them? Comparison data IF detection is via comparison to oracle data, where do we get the data? Detection What heuristics/rules tell us there might be a problem? Evaluation How to decide whether X is a problem or not? Troubleshooting support Failure triggers what further data collection? Notification How/when is failure reported? Retention In general, what data do we keep? Maintenance How are tests / suites updated / replaced? Relevant contexts Under what circumstances is this approach relevant/desirable?

Primary drivers of our designs
The primary driver of a design is the key factor that motivates us or makes the testing possible. In Doug's and my experience, the most common primary drivers have been: Theory of error We’re hunting a class of bug that we have no better way to find Available oracle We have an opportunity to verify or validate a behavior with a tool Ability to drive long sequences We can execute a lot of these tests cheaply.

More on … Theory of Error
Computational errors Communications problems protocol error their-fault interoperability failure Resource unavailability or corruption, driven by history of operations competition for the resource Race conditions or other time-related or thread-related errors Failure caused by toxic data value combinations that span a large portion or a small portion of the data space that are likely or unlikely to be visible in "obvious" tests based on customer usage or common heuristics

Simulate Events with Diagnostic Probes
1984. First phone on the market with an LCD display. One of the first PBX's with integrated voice and data. 108 voice features, 110 data features. Simulate traffic on system, with Settable probabilities of state transitions Diagnostic reporting whenever a suspicious event detected

More on … Available Oracle
Typical oracles used in test automation Reference program Model that predicts results Embedded or self-verifying data Checks for known constraints Diagnostics

Function Equivalence Testing
MASPAR (the Massively Parallel computer, 64K parallel processors). The MASPAR computer has several built-in mathematical functions. We’re going to consider the Integer square root. This function takes a 32-bit word as an input. Any bit pattern in that word can be interpreted as an integer whose value is between 0 and There are 4,294,967,296 possible inputs to this function. Tested against a reference implementation of square root

Function Equivalence Test
The 32-bit tests took the computer only 6 minutes to run the tests and compare the results to an oracle. There were 2 (two) errors, neither of them near any boundary. (The underlying error was that a bit was sometimes mis-set, but in most error cases, there was no effect on the final calculated result.) Without an exhaustive test, these errors probably wouldn’t have shown up. For 64-bit integer square root, function equivalence tests involved random sample rather than exhaustive testing because the full set would have required 6 minutes x 232 tests.

This tests for equivalence of functions, but it is less exhaustive than it looks (Acknowledgement: From Doug Hoffman) Program state System state Configuration and system resources Cooperating processes, clients or servers Impacts on connected devices / resources To cooperating processes, clients or servers Program state, (and uninspected outputs) System under test Reference function Monitored outputs Intended inputs

More on … Ability to Drive Long Sequences
Any execution engine will (potentially) do: Commercial regression-test execution tools Customized tools for driving programs with (for example) Messages (to be sent to other systems or subsystems) Inputs that will cause state transitions Inputs for evaluation (e.g. inputs to functions)

Long-sequence regression
Tests taken from the pool of tests the program has passed in this build. The tests sampled are run in random order until the software under test fails (e.g crash). Typical defects found include timing problems, memory corruption (including stack corruption), and memory leaks. Recent (2004) release: 293 reported failures exposed 74 distinct bugs, including 14 showstoppers. Note: these tests are no longer testing for the failures they were designed to expose. these tests add nothing to typical measures of coverage, because the statements, branches and subpaths within these tests were covered the first time these tests were run in this build.

Imagining a structure for high-volume automated testing

Some common characteristics
The tester codes a testing process rather than individual tests. Following the tester’s algorithms, the computer creates tests (maybe millions of tests), runs them, evaluates their results, reports suspicious results (possible failures), and reports a summary of its testing session. The tests often expose bugs that we don’t know how to design focused tests to look for. They expose memory leaks, wild pointers, stack corruption, timing errors and many other problems that are not anticipated in the specification, but are clearly inappropriate (i.e. bugs). Traditional expected results (the expected result of 2+3 is 5) are often irrelevant.

What can we vary? Inputs to functions Combinations of data
To check input filters To check operation of the function To check consequences (what the other parts of the program do with the results of the function) To drive the program's outputs Combinations of data Sequences of tasks Contents of files Input files Reference files Configuration files State transitions Sequences in a state model Sequences that drive toward a result Execution environment Background activity Competition for specific resources Message streams

How can we vary them? Statistical or AI sampling Fuzzing:
Test selection optimized against some criteria Long-sequence regression Model-based oracle E.g. state machine E.g. mathematical model Reference program Diagnostic oracle Constraint oracle Fuzzing: Random generation / selection of tests Execution engine Weak oracle (run till crash) Fuzzing examples Random inputs Random state transitions (dumb monkey) File contents Message streams Grammars

Long-Sequence Regression
Fuzzing Sampling system Long-Sequence Regression Oracles Model Reference Diagnostic Constraint Inputs Input filters Function Consequences Output filters Combinations Task sequences File contents Input / reference / config State transitions Execution environment

Issues that Drive Design of Test Automation
Theory of error What kinds of errors do we hope to expose? Input data How will we select and generate input data and conditions? Sequential dependence Should tests be independent? If not, what info should persist or drive sequence from test N to N+1? Execution How well are test suites run, especially in case of individual test failures? Output data Observe which outputs, and what dimensions of them? Comparison data IF detection is via comparison to oracle data, where do we get the data? Detection What heuristics/rules tell us there might be a problem? Evaluation How to decide whether X is a problem or not? Troubleshooting support Failure triggers what further data collection? Notification How/when is failure reported? Retention In general, what data do we keep? Maintenance How are tests / suites updated / replaced? Relevant contexts Under what circumstances is this approach relevant/desirable?

About Cem Kaner Professor of Software Engineering, Florida Tech
UCITA AND CONSUMERS About Cem Kaner Professor of Software Engineering, Florida Tech I’ve worked in all areas of product development (programmer, tester, writer, teacher, user interface designer, software salesperson, organization development consultant, as a manager of user documentation, software testing, and software development, and as an attorney focusing on the law of software quality.) Senior author of three books: Lessons Learned in Software Testing (with James Bach & Bret Pettichord) Bad Software (with David Pels) Testing Computer Software (with Jack Falk & Hung Quoc Nguyen). My doctoral research on psychophysics (perceptual measurement) nurtured my interests in human factors (usable computer systems) and measurement theory. Copyright (c) Cem Kaner, 1999

UCITA AND CONSUMERS An Overview of High Volume Test Automation (Early Draft: Feb 24, 2012) Cem Kaner, J.D., Ph.D. Professor of Software Engineering Florida.

Similar presentations

Presentation on theme: "UCITA AND CONSUMERS An Overview of High Volume Test Automation (Early Draft: Feb 24, 2012) Cem Kaner, J.D., Ph.D. Professor of Software Engineering Florida."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

UCITA AND CONSUMERS An Overview of High Volume Test Automation (Early Draft: Feb 24, 2012) Cem Kaner, J.D., Ph.D. Professor of Software Engineering Florida.

Similar presentations

Presentation on theme: "UCITA AND CONSUMERS An Overview of High Volume Test Automation (Early Draft: Feb 24, 2012) Cem Kaner, J.D., Ph.D. Professor of Software Engineering Florida."— Presentation transcript:

Similar presentations

About project

Feedback