Testing slides compiled from Alex Aiken’s, Neelam Gupta’s, Tao Xie’s.

Slides:



Advertisements
Similar presentations
Software Testing. Quality is Hard to Pin Down Concise, clear definition is elusive Not easily quantifiable Many things to many people You'll know it when.
Advertisements

Test process essentials Riitta Viitamäki,
White Box and Black Box Testing Tor Stålhane. What is White Box testing White box testing is testing where we use the info available from the code of.
An Empirical Study of the Reliability in UNIX Utilities Barton Miller Lars Fredriksen Brysn So Presented by Liping Cai.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
An Analysis and Survey of the Development of Mutation Testing by Yue Jia and Mark Harmon A Quick Summary For SWE6673.
Unit Testing CSSE 376, Software Quality Assurance Rose-Hulman Institute of Technology March 27, 2007.
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.
API Design CPSC 315 – Programming Studio Fall 2008 Follows Kernighan and Pike, The Practice of Programming and Joshua Bloch’s Library-Centric Software.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
CSE 331 SOFTWARE DESIGN & IMPLEMENTATION TESTING II Autumn 2011.
Testing: Who 3, What 4, Why 1, When 2, How 5 Lian Yu, Peking U. Michal Young, U. Oregon.
1 CS115 Class 16: Testing (continued) Due today –Review: No Silver Bullet For next time –Deliverable: Unit Tests – one per group member –Review: The New.
(c) 2007 Mauro Pezzè & Michal Young Ch 16, slide 1 Fault-Based Testing.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Adapted from Prof. Necula, CS 169, Berkeley1 Testing V Software Engineering Lecture 7, Spring 2008 Clark Barrett, New York University (with.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.
Approaches to ---Testing Software Some of us “hope” that our software works as opposed to “ensuring” that our software works? Why? Just foolish Lazy Believe.
1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.
Terms: Test (Case) vs. Test Suite
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
CMSC 345 Fall 2000 Unit Testing. The testing process.
Coverage – “Systematic” Testing Chapter 20. Dividing the input space for failure search Testing requires selecting inputs to try on the program, but how.
Testing. 2 Overview Testing and debugging are important activities in software development. Techniques and tools are introduced. Material borrowed here.
Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.
1 Intro to Java Week 12 (Slides courtesy of Charatan & Kans, chapter 8)
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Testing Testing Techniques to Design Tests. Testing:Example Problem: Find a mode and its frequency given an ordered list (array) of with one or more integer.
Test Coverage CS-300 Fall 2005 Supreeth Venkataraman.
Chapter 22 Developer testing Peter J. Lane. Testing can be difficult for developers to follow  Testing’s goal runs counter to the goals of the other.
Testing and Debugging Session 9 LBSC 790 / INFM 718B Building the Human-Computer Interface.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
The Software Development Process
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
Software Testing Part II March, Fault-based Testing Methodology (white-box) 2 Mutation Testing.
ANU COMP2110 Software Design in 2003 Lecture 10Slide 1 COMP2110 Software Design in 2004 Lecture 12 Documenting Detailed Design How to write down detailed.
08120: Programming 2: SoftwareTesting and Debugging Dr Mike Brayshaw.
Software Quality Assurance and Testing Fazal Rehman Shamil.
Week 5-6 MondayTuesdayWednesdayThursdayFriday Testing III No reading Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.
CSE 311 Foundations of Computing I Lecture 28 Computability: Other Undecidable Problems Autumn 2011 CSE 3111.
Observing the Current System Benefits Can see how the system actually works in practice Can ask people to explain what they are doing – to gain a clear.
Efficiently Solving Computer Programming Problems Doncho Minkov Telerik Corporation Technical Trainer.
Mutation Testing Breaking the application to test it.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 23, 1999.
Week 6 MondayTuesdayWednesdayThursdayFriday Testing III Reading due Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Mutation Testing Laraib Zahid & Mariam Arshad. What is Mutation Testing?  Fault-based Testing: directed towards “typical” faults that could occur in.
1 Software Testing. 2 What is Software Testing ? Testing is a verification and validation activity that is performed by executing program code.
Software Testing and Quality Assurance Practical Considerations (1) 1.
Cs498dm Software Testing Darko Marinov January 24, 2012.
Introduction to Software Testing (2nd edition) Chapter 5 Criteria-Based Test Design Paul Ammann & Jeff Offutt
CSC 108H: Introduction to Computer Programming
Advanced Algorithms Analysis and Design
A Review of Software Testing - P. David Coward
CompSci 230 Software Construction
CS5123 Software Validation and Quality Assurance
Chapter 13 & 14 Software Testing Strategies and Techniques
Unit# 9: Computer Program Development
PROGRAMMING METHODOLOGY
CSCE 315 – Programming Studio, Fall 2017 Tanzir Ahmed
Objective of This Course
A Few Review Questions.
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Data Flow Analysis Compiler Design
Test Cases, Test Suites and Test Case management systems
Testing.
Presentation transcript:

Testing slides compiled from Alex Aiken’s, Neelam Gupta’s, Tao Xie’s.

CS590F Software Reliability Why Testing  Researchers investigate many approaches to improving software quality  But the world tests  > 50% of the cost of software development is testing  Testing is consistently a hot research topic

CS590F Software Reliability

Note  Fundamentally, seems to be not as deep Over so many years, simple ideas still deliver the best performance  Recent trends Testing + XXX DAIKON, CUTE  Messages conveyed Two messages and one conclusion.  Difficuties Beat simple ideas (sometimes hard) Acquire a large test suite

CS590F Software Reliability Outline  Testing practice Goals: Understand current state of practice  Boring  But necessary for good science Need to understand where we are, before we try to go somewhere else  Testing Research  Some textbook concepts

CS590F Software Reliability Testing Practice

CS590F Software Reliability Outline  Manual testing  Automated testing  Regression testing  Nightly build  Code coverage  Bug trends

CS590F Software Reliability Manual Testing  Test cases are lists of instructions “ test scripts ”  Someone manually executes the script Do each action, step-by-step  Click on “ login ”  Enter username and password  Click “ OK ”  … And manually records results  Low-tech, simple to implement

CS590F Software Reliability Manual Testing  Manual testing is very widespread Probably not dominant, but very, very common  Why? Because Some tests can ’ t be automated  Usability testing Some tests shouldn ’ t be automated  Not worth the cost

CS590F Software Reliability Manual Testing  Those are the best reasons  There are also not-so-good reasons Not-so-good because innovation could remove them Testers aren ’ t skilled enough to handle automation Automation tools are too hard to use The cost of automating a test is 10X doing a manual test

CS590F Software Reliability Topics  Manual testing  Automated testing  Regression testing  Nightly build  Code coverage  Bug trends

CS590F Software Reliability Automated Testing  Idea: Record manual test Play back on demand  This doesn ’ t work as well as expected E.g., Some tests can ’ t/shouldn ’ t be automated

CS590F Software Reliability Fragility  Test recording is usually very fragile Breaks if environment changes anything E.g., location, background color of textbox  More generally, automation tools cannot generalize a test They literally record exactly what happened If anything changes, the test breaks  A hidden strength of manual testing Because people are doing the tests, ability to adapt tests to slightly modified situations is built-in

CS590F Software Reliability Breaking Tests  When code evolves, tests break E.g., change the name of a dialog box Any test that depends on the name of that box breaks  Maintaining tests is a lot of work Broken tests must be fixed; this is expensive Cost is proportional to the number of tests Implies that more tests is not necessarily better

CS590F Software Reliability Improved Automated Testing  Recorded tests are too low level E.g., every test contains the name of the dialog box  Need to abstract tests Replace dialog box string by variable name X Variable name X is maintained in one place  So that when the dialog box name changes, only X needs to be updated and all the tests work again

CS590F Software Reliability Data Driven Testing (for Web Applications)  Build a database of event tuples  E.g.,  A test is a series of such events chained together  Complete system will have many relations As complicated as any large database

CS590F Software Reliability Discussion  Testers have two jobs Clarify the specification Find (important) bugs  Only the latter is subject to automation  Helps explain why there is so much manual testing

CS590F Software Reliability Topics  Manual testing  Automated testing  Regression testing  Nightly build  Code coverage  Bug trends

CS590F Software Reliability Regression Testing  Idea When you find a bug, Write a test that exhibits the bug, And always run that test when the code changes, So that the bug doesn ’ t reappear  Without regression testing, it is surprising how often old bugs reoccur

CS590F Software Reliability Regression Testing (Cont.)  Regression testing ensures forward progress We never go back to old bugs  Regression testing can be manual or automatic Ideally, run regressions after every change To detect problems as quickly as possible  But, regression testing is expensive Limits how often it can be run in practice Reducing cost is a long-standing research problem

CS590F Software Reliability Nightly Build  Build and test the system regularly Every night  Why? Because it is easier to fix problems earlier than later Easier to find the cause after one change than after 1,000 changes Avoids new code from building on the buggy code  Test is usually subset of full regression test “ smoke test ” Just make sure there is nothing horribly wrong

CS590F Software Reliability A Problem  So far we have: Measure changes regularly(nightly build) Make monotonic progress(regression)  How do we know when we are done? Could keep going forever  But, testing can only find bugs, not prove their absence We need a proxy for the absence of bugs

CS590F Software Reliability Topics  Manual testing  Automated testing  Regression testing  Nightly build  Code coverage  Bug trends

CS590F Software Reliability Code Coverage  Idea Code that has never been executed likely has bugs  This leads to the notion of code coverage Divide a program into units (e.g., statements) Define the coverage of a test suite to be # of statements executed by suite # of statements

CS590F Software Reliability Code Coverage (Cont.)  Code coverage has proven value It ’ s a real metric, though far from perfect  But 100% coverage does not mean no bugs E.g., a bug visible after loop executes 1,025 times  And 100% coverage is almost never achieved Infeasible paths Ships happen with < 60% coverage High coverage may not even be desirable  May be better to devote more time to tricky parts with good coverage

CS590F Software Reliability Using Code Coverage  Code coverage helps identify weak test suites  Code coverage can ’ t complain about missing code But coverage can hint at missing cases  Areas of poor coverage ) areas where not enough thought has been given to specification

CS590F Software Reliability More on Coverage  Statement coverage  Edge coverage  Path coverage  Def-use coverage

CS590F Software Reliability Topics  Manual testing  Automated testing  Regression testing  Nightly build  Code coverage  Bug trends

CS590F Software Reliability Bug Trends  Idea: Measure rate at which new bugs are found  Rational: When this flattens out it means 1. The cost/bug found is increasing dramatically 2. There aren ’ t many bugs left to find

CS590F Software Reliability The Big Picture  Standard practice Measure progress often (nightly builds) Make forward progress (regression testing) Stopping condition(coverage, bug trends)

CS590F Software Reliability Testing Research

CS590F Software Reliability Outline  Overview of testing research Definitions, goals  Three topics Random testing Efficient regression testing Mutation analysis

CS590F Software Reliability Overview  Testing research has a long history At least to the 1960 ’ s  Much work is focused on metrics Assigning numbers to programs Assigning numbers to test suites Heavily influenced by industry practice  More recent work focuses on deeper analysis Semantic analysis, in the sense we understand it

CS590F Software Reliability What is a Good Test?  Attempt 1: If program P implements function F on domain D, then a test set T µ D is reliable if (8 t 2 T. P(t) = F(t)) ) 8 t 2 D. P(t) = F(t)  Says that a good test set is one that implies the program meets its specification

CS590F Software Reliability Good News/Bad News  Good News There are interesting examples of reliable test sets Example: A function that sorts N numbers using comparisons sorts correctly iff it sorts all inputs consisting of 0,1 correctly This is a finite reliable test set  Bad News There is no effective method for generating finite reliable test sets

CS590F Software Reliability An Aside  It ’ s clear that reliable test sets must be impossible to compute in general  But most programs are not diagonalizing Turing machines...  It ought to be possible to characterize finite reliable test sets for certain classes of programs

CS590F Software Reliability Adequacy  Reliability is not useful if we don ’ t have a full reliable test set Then it is possible that the program passes every test, but is not the program we want  A different definition If program P implements function F on domain D, then a test set T µ D is adequate if (8progs Q. Q(D)  F(D)) ) 9 t 2 T. Q(t)  F(t)

CS590F Software Reliability Adequacy  Adequacy just says that the test suite must make every incorrect program fail  This seems to be what we really want

CS590F Software Reliability Outline  Overview of testing research Definitions, goals  Three topics Random testing Efficient regression testing Mutation analysis

CS590F Software Reliability Random Testing  About ¼ of Unix utilities crash when fed random input strings Up to 100,000 characters  What does this say about testing?  What does this say about Unix?

CS590F Software Reliability What it Says About Testing  Randomization is a highly effective technique And we use very little of it in software  “ A random walk through the state space ”  To say anything rigorous, must be able to characterize the distribution of inputs Easy for string utilities Harder for systems with more arcane input  E.g., parsers for context-free grammars

CS590F Software Reliability What it Says About Unix  What sort of bugs did they find? Buffer overruns Format string errors Wild pointers/array out of bounds Signed/unsigned characters Failure to handle return codes Race conditions  Nearly all of these are problems with C! Would disappear in Java Exceptions are races & return codes

CS590F Software Reliability A Nice Bug csh !0%8f  ! is the history lookup operator No command beginning with 0%8f  csh passes an error “ 0%8f: Not found ” to an error printing routine  Which prints it with printf()

CS590F Software Reliability Outline  Overview of testing research Definitions, goals  Three topics Random testing Efficient regression testing Mutation analysis

CS590F Software Reliability Efficient Regression Testing  Problem: Regression testing is expensive  Observation: Changes don ’ t affect every test And tests that couldn ’ t change need not be run  Idea: Use a conservative static analysis to prune test suite

CS590F Software Reliability The Algorithm Two pieces: 1. Run the tests and record for each basic block which tests reach that block 2. After modifications, do a DFS of the new control flow graph. Wherever it differs from the original control flow graph, run all tests that reach that point

CS590F Software Reliability Example t1 t2 t3 Label each node of the control flow graph with the set of tests that reach it.

CS590F Software Reliability Example (Cont.) t1 t2 t3 When a statement is modified, rerun just the tests reaching that statement

CS590F Software Reliability Experience  This works And it works better on larger programs # of test cases to rerun reduced by > 90%  Total cost less than cost of running all tests Total cost = cost of tests run + cost of tool  Impact analysis

CS590F Software Reliability Outline  Overview of testing research Definitions, goals  Three topics Random testing Efficient regression testing Mutation analysis

CS590F Software Reliability Adequacy (Review) If program P implements function F on domain D, then a test set T µ D is adequate if (8progs Q. Q(D)  F(D)) ) 9 t 2 T. Q(t)  F(t) But we can ’ t afford to quantify over all programs...

CS590F Software Reliability From Infinite to Finite  We need to cut down the size of the problem Check adequacy wrt a smaller set of programs  Idea: Just check a finite number of (systematic) variations on the program E.g., replace x > 0 by x < 0 Replace I by I+1, I-1  This is mutation analysis

CS590F Software Reliability Mutation Analysis  Modify (mutate) each statement in the program in finitely many different ways  Each modification is one mutant  Check for adequacy wrt the set of mutants Find a set of test cases that distinguishes the program from the mutants If program P implements function F on domain D, then a test set T µ D is adequate if (8mutants Q. Q(D)  F(D)) ) 9 t 2 T. Q(t)  F(t)

CS590F Software Reliability What Justifies This?  The “ competent programmer assumption ” The program is close to right to begin with  It makes the infinite finite We will inevitably do this anyway; at least here it is clear what we are doing  This already generalizes existing metrics If it is not the end of the road, at least it is a step forward

CS590F Software Reliability The Plan  Generate mutants of program P  Generate tests By some process  For each test t For each mutant M  If M(t)  P(t) mark M as killed  If the tests kill all mutants, the tests are adequate

CS590F Software Reliability The Problem  This is dreadfully slow  Lots of mutants  Lots of tests  Running each mutant on each test is expensive  But early efforts more or less did exactly this

CS590F Software Reliability Simplifications  To make progress, we can either Strengthen our algorithms Weaken our problem  To weaken the problem Selective mutation  Don ’ t try all of the mutants Weak mutation  Check only that mutant produces different state after mutation, not different final output  50% cheaper

CS590F Software Reliability Better Algorithms  Observation: Mutants are nearly the same as the original program  Idea: Compile one program that incorporates and checks all of the mutations simultaneously A so-called meta-mutant

CS590F Software Reliability Metamutant with Weak Mutation  Constructing a metamutant for weak mutation is straightforward  A statement has a set of mutated statements With any updates done to fresh variables X := Y > 1 After statement, check to see if values differ X == X_1 X == X_2

CS590F Software Reliability Comments  A metamutant for weak mutation should be quite practical Constant factor slowdown over original program  Not clear how to build a metamutant for stronger mutation models

CS590F Software Reliability Generating Tests  Mutation analysis seeks to generate adequate test sets automatically  Must determine inputs such that Mutated statement is reached Mutated statement produces a result different from original s, s’

CS590F Software Reliability Automatic Test Generation  This is not easy to do  Approaches Backward approach  Work backwards from statement to inputs Take short paths through loops  Generate symbolic constraints on inputs that must be satisfied  Solve for inputs

CS590F Software Reliability Automatic Test Generation (Cont.)  Work forwards from inputs Symbolic execution (Tao) Concrete execution (CUTE) Arithmetic rep. based (gupta)

CS590F Software Reliability Comments on Test Generation  Apparently works well for Small programs Without pointers For certain classes of mutants  So not very clear how well it works in general Note: Solutions for pointers are proposed

CS590F Software Reliability A Problem  What if a mutant is equivalent to the original?  Then no test will kill it  In practice, this is a real problem Not easily solved  Try to prove program equivalence automatically  Often requires manual intervention Undermines the metric  How about more complicated mutants?

CS590F Software Reliability Opinions  Mutation analysis is a good idea For all the reasons cited before Also technically interesting And there is probably more to do...  How important is automatic test generation? Still must manually look at output of tests  This is a big chunk of the work, anyway Weaken the problem  Directed ATG is a quite interesting direction to go. Automatic tests likely to be weird  Both good and bad

CS590F Software Reliability Opinions  Testing research community trying to learn From programming languages community  Slicing, dataflow analysis, etc. From theorem proving community  Verification conditions, model checking

CS590F Software Reliability Some Textbook Concepts  About different levels of testing System test, Model Test, Unit test, Integration Test  Black box vs. White box Functional vs. structural

CS590F Software Reliability Given F(x1,x2) with constraints Boundary Value analysis focuses on the boundary of the input space to identify test cases. Use input variable value at min, just above min, a nominal value, just above max, and at max. d x1 a c x2 b Boundary Value Test

CS590F Software Reliability Next Lecture  Program Slicing