Automatic System Testing of Programs without Test Oracles

Slides:

Advertisements

Similar presentations

Modular and Verified Automatic Program Repair Francesco Logozzo, Thomas Ball RiSE - Microsoft Research Redmond.

Advertisements

Coordinatate systems are used to assign numeric values to locations with respect to a particular frame of reference commonly referred to as the origin.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Order Statistics Sorted

Test-Driven Development and Refactoring CPSC 315 – Programming Studio.

Testing Theories: Three Reasons Why Data Might not Match the Theory.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Programming Types of Testing.

Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

ECIV 201 Computational Methods for Civil Engineers Richard P. Ray, Ph.D., P.E. Error Analysis.

Metamorphic Testing Techniques to Detect Defects in Applications without Test Oracles Christian Murphy Thesis Defense April 12, 2010.

Introduction to Analysis of Algorithms

1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.

Properties of Machine Learning Applications for Use in Metamorphic Testing Chris Murphy, Gail Kaiser, Lifeng Hu, Leon Wu Columbia University.

1 An Approach to Software Testing of Machine Learning Applications Chris Murphy, Gail Kaiser, Marta Arias Columbia University.

The In Vivo Testing Approach Christian Murphy, Gail Kaiser, Ian Vo, Matt Chu Columbia University.

On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

Applications of Metamorphic Testing Chris Murphy University of Pennsylvania November 17, 2011.

Using JML Runtime Assertion Checking to Automate Metamorphic Testing in Applications without Test Oracles Christian Murphy, Kuang Shen, Gail Kaiser Columbia.

Distributed In Vivo Testing of Software Applications Matt Chu, Christian Murphy, Gail Kaiser Columbia University.

Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.

Automatic Detection of Previously-Unseen Application States for Deployment Environment Testing and Analysis Chris Murphy, Moses Vaughan, Waseem Ilahi,

ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky.

Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.

Using Runtime Testing to Detect Defects in Applications without Test Oracles Chris Murphy Columbia University November 10, 2008.

Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.

Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared.

Detection and Resolution of Anomalies in Firewall Policy Rules

Presenter: Shant Mandossian EFFECTIVE TESTING OF HEALTHCARE SIMULATION SOFTWARE.

Testing. What is Testing? Definition: exercising a program under controlled conditions and verifying the results Purpose is to detect program defects.

Testing Theories: Three Reasons Why Data Might not Match the Theory Psych 437.

© SERG Dependable Software Systems (Mutation) Dependable Software Systems Topics in Mutation Testing and Program Perturbation Material drawn from [Offutt.

Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.

1 Automatic Refinement and Vacuity Detection for Symbolic Trajectory Evaluation Orna Grumberg Technion Haifa, Israel Joint work with Rachel Tzoref.

Module 1: Statistical Issues in Micro simulation Paul Sousa.

Department of CS and Mathematics, University of Pitesti State-based Testing is Functional Testing ! Florentin Ipate, Raluca Lefticaru University of Pitesti,

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.

European Test Symposium, May 28, 2008 Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI Kundan.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.

1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng

Quality Assurance.

What is Testing? Testing is the process of finding errors in the system implementation. –The intent of testing is to find problems with the system.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Chapter 3.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Computer Science 1 Systematic Structural Testing of Firewall Policies JeeHyun Hwang 1, Tao Xie 1, Fei Chen 2, and Alex Liu 2 North Carolina State University.

CPSC 871 John D. McGregor Module 8 Session 1 Testing.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Error Explanation with Distance Metrics Authors: Alex Groce, Sagar Chaki, Daniel Kroening, and Ofer Strichman International Journal on Software Tools for.

Mutation Testing Breaking the application to test it.

Software Testing Sudipto Ghosh CS 406 Fall 99 November 23, 1999.

Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.

Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.

4 - Conditional Control Structures CHAPTER 4. Introduction A Program is usually not limited to a linear sequence of instructions. In real life, a programme.

Mutation Testing Laraib Zahid & Mariam Arshad. What is Mutation Testing?  Fault-based Testing: directed towards “typical” faults that could occur in.

Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.

CPSC 372 John D. McGregor Module 8 Session 1 Testing.

Machine Learning: Ensemble Methods

John D. McGregor Session 9 Testing Vocabulary

Aditya P. Mathur Purdue University

John D. McGregor Session 9 Testing Vocabulary

John D. McGregor Session 9 Testing Vocabulary

Objective of This Course

Test Case Purification for Improving Fault Localization

Soft Error Detection for Iterative Applications Using Offline Training

Taxonomy of Test Oracles

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Introduction to Data Structure

Presentation transcript:

Automatic System Testing of Programs without Test Oracles Christian Murphy, Kuang Shen, Gail Kaiser Columbia University

Problem Statement Some applications (e.g. machine learning, simulation) do not have test oracles that indicate whether the output is correct for arbitrary input Oracles may exist for a limited subset of the input domain, and gross errors (e.g. crashes) can be detected with certain inputs or techniques However, it is difficult to detect subtle (computational) errors for arbitrary inputs in such “non-testable programs”

Observation If there is no oracle in the general case, we cannot know the expected relationship between a particular input and its output However, it may be possible to know relationships between sets of inputs and the corresponding set of outputs “Metamorphic Testing” [Chen et al. ’98] is such an approach

Metamorphic Testing An approach for creating follow-up test cases based on previous test cases If input x produces output f(x), then the function’s “metamorphic properties” are used to guide a transformation function t, which is applied to produce a new test case input, t(x) We can then predict the expected value of f(t(x)) based on the value of f(x) obtained from the actual execution

Metamorphic Testing without an Oracle When a test oracle exists, we can know whether f(t(x)) is correct Because we have an oracle for f(x) So if f(t(x)) is as expected, then it is correct When there is no test oracle, f(x) acts as a “pseudo-oracle” for f(t(x)) If f(t(x)) is as expected, it is not necessarily correct However, if f(t(x)) is not as expected, either f(x) or f(t(x)) (or both) is wrong

Metamorphic Testing Example Consider a program that reads a text file of test scores for students in a class, and computes the averages and the standard deviation of the averages If we permute the values in the text file, the results should stay the same If we multiply each score by 10, the final results should all be multiplied by 10 as well These metamorphic properties can be used to create a “pseudo-oracle” for the application

Limitations of Metamorphic Testing Manual transformation of the input data or comparison of output can be laborious and error-prone Comparison of outputs not always possible with tools like diff when they are not expected to be “exactly” the same

Our Solution Automated Metamorphic System Testing Tester needs to: Specify the application’s metamorphic properties Configure the testing framework Run the application with its test input Framework takes care of automatically: Transforming program input data Executing multiple instances of the application with different transformed inputs in parallel Comparing outputs of the executions

Model

Amsterdam: Automated Metamorphic System Testing Framework Metamorphic properties are specified in XML Input transformation Runtime options Output comparison Framework provides out-of-box support for numerous transformation and comparison functions but is extendable to support custom operations Additional invocations are executed in parallel in separate sandboxes that have their own virtual execution environment [Osman et al. OSDI’02]

Empirical Studies To measure the effectiveness of the approach, we selected three real-world applications from the domain of supervised machine learning Support Vector Machines (SVM): vector-based classifier C4.5: decision tree classifier MartiRank: ranking application

Methodology (1) Mutation testing was used to seed defects into each application Comparison operators were reversed Math operators were changed Off-by-one errors were introduced For each program, we created multiple variants, each with exactly one mutation Weak mutants (that did not affect the final output) were discarded, as were those that caused outputs that were obviously wrong

Methodology (2) Each variant (containing one mutation) acted as a pseudo-oracle for itself: Program was run to produce an output with the original input dataset Metamorphic properties applied to create new input datasets Program run on new inputs to create new outputs If outputs not as expected, the mutant had been killed (i.e. the defect had been detected)

Metamorphic Properties Each application had four metamorphic properties specified, based on: Permuting the order of the elements in the input data set Multiplying the elements by a positive constant Adding a constant to the elements Negating the values of the elements in the input data Testing was conducted using our implementation of the Amsterdam framework

SVM Results Permuting the input was very effective at killing off-by-one mutants Many functions in SVM perform calculations on a set of numbers Off-by-one mutants caused some element of the set to be omitted By permuting, a different number would be omitted The results of the calculations would be different, revealing the defect

C4.5 Results Negating the input was very effective C4.5 creates a decision tree in which nodes contain clauses like “if attrn > α then class = C” If the data set is negated, those nodes should change to “if attrn ≤ -α then class = C”, i.e. both the operator and the sign of α In most cases, only one of the changes occurred

MartiRank Results Permuting and negating were effective at killing comparison operator mutants MartiRank depends heavily on sorting Permuting and negating change which numbers get compared and what the result should be, thus inducing the differences in the final sorted list

Summary of Results 143 mutants killed out of 182 (78%) Permuting or negating the inputs proved to be effective techniques for killing mutants because of the mathematical nature of the applications Multiplying and adding were not effective, possibly because of the nature of the mutants we inserted

Benefits of Automation For SVM, all of the metamorphic properties called for the outputs to be the same as the original But in practice we knew they wouldn’t be exactly the same Partly due to floating point calculations Partly due to approximations in the implementation We could use Heuristic Metamorphic Testing to allow for outputs that were considered “close enough” (either semantically or to within some tolerance)

Effect on Testing Time Without parallelism, metamorphic testing introduces at least 100% overhead since the application must be run at least twice In our experiments on a multi-core machine, the only overhead came from creating the “sandbox” and comparing the results less than one second for a 10MB input file

Limitations and Future Work Framework Implementation The “sandbox” only includes in-process memory and the file system, but not anything external to the system The framework does not yet address fault localization Approach Approach requires some knowledge of the application to determine the metamorphic properties in the first place Need to investigate applicability to other domains Further applicability of Heuristic Metamorphic Testing to non-deterministic applications

Contributions A testing technique called Automated Metamorphic System Testing that facilitates testing of non-testable programs An implementation called Amsterdam Empirical studies demonstrating the effectiveness of the approach

Automatic System Testing of Programs without Test Oracles Chris Murphy cmurphy@cs.columbia.edu http://psl.cs.columbia.edu/metamorphic

Related Work Pseudo-oracles [Davis & Weyuker ACM’81] Testing non-testable programs [Weyuker TCJ’82] Overview of approaches [Baresi and Young ’01] Embedded assertion languages Extrinsic interface contracts Pure specification languages Trace checking & log file analysis Using metamorphic testing [Chen et al. JIST’02; others]

Related Work Applying Metamorphic Testing to “non-testable programs” Chen et al. ISSTA’02 (among others) Automating metamorphic testing Gotleib & Botella COMPSAC’03

Categories of Metamorphic Properties Additive: Increase (or decrease) numerical values by a constant Multiplicative: Multiply numerical values by a constant Permutative: Randomly permute the order of elements in a set Invertive: Reverse the order of elements in a set Inclusive: Add a new element to a set Exclusive: Remove an element from a set Others…. ML apps such as ranking, classification, and anomaly detection exhibit these properties [Murphy SEKE’08]

Specifying Metamorphic Properties

Further Testing For each app, additional data sets were used to see if more mutants could be killed SVM: 18 of remaining 19 were killed MartiRank: 6 of remaining 19 were killed C4.5: one remaining mutant was killed

Heuristic Metamorphic Testing Specify metamorphic properties in which the results are may be “similar” but not necessarily exactly the same as predicted Reducing false positives by checking against a difference threshold when comparing floating point numbers Addressing non-determinism by specifying heuristics for what is considered “close”