Of 17 Assessing the Influence of Multiple Test Case Selection on Mutation Experiments Marcio E. Delamaro and Jeff Offutt George Mason University & Universidade.


Similar presentations
M&Ms Statistics.

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Nonparametric tests and ANOVAs: What you need to know.
January 7, afternoon session 1 Multi-factor ANOVA and Multiple Regression January 5-9, 2008 Beth Ayers.
Lecture 13 – Tues, Oct 21 Comparisons Among Several Groups – Introduction (Case Study 5.1.1) Comparing Any Two of the Several Means (Chapter 5.2) The One-Way.
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
Chapter 14 Analyzing Quantitative Data. LEVELS OF MEASUREMENT Nominal Measurement Nominal Measurement Ordinal Measurement Ordinal Measurement Interval.
Lecture 9: One Way ANOVA Between Subjects
Statistical Evaluation of Data
Psychology 242 Research Methods II Dr. David Allbritton
Biostatistics in Research Practice: Non-parametric tests Dr Victoria Allgar.
A SYSTEM FOR CHOOSING STATISTICS u What type of design do you have? u What do you want to find out? u What type of data do you have?
Today Concepts underlying inferential statistics
Copyright © Allyn & Bacon (2007) Manual Statistical Computation Procedures Graziano and Raulin Research Methods This multimedia product and its contents.
The use of statistics in psychology. statistics Essential Occasionally misleading.
Choosing Statistical Procedures
Describing distributions with numbers
Extension to ANOVA From t to F. Review Comparisons of samples involving t-tests are restricted to the two-sample domain Comparisons of samples involving.
Copyright © Allyn & Bacon (2010) Manual Statistical Computation Procedures Graziano and Raulin Research Methods This multimedia product and its contents.
GUTS Youth Leadership Corps Things you need to know.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Special Topics 504: Practical Methods in Analyzing Animal Science Experiments The course is: Designed to help familiarize you with the most common methods.
Statistical Evaluation of Data
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
110/10/2015Slide 1 The homework problems on comparing central tendency and variability extend our focus on central tendency and variability to a comparison.
Chapter 9 Statistics Section 9.2 Measures of Variation.
Chapter 14 Nonparametric Statistics. 2 Introduction: Distribution-Free Tests Distribution-free tests – statistical tests that don’t rely on assumptions.
EDLD 6392 Advanced Topics in Statistical Reasoning Texas A&M University-Kingsville Research Designs and Statistical Procedures.
Thinking About Psychology: The Science of Mind and Behavior 2e Charles T. Blair-Broeker Randal M. Ernst.
KNR 445 Statistics t-tests Slide 1 Variability Measures of dispersion or spread 1.
The Sampling Distribution of a Statistic Recall that a statistic is simply a number which we somehow attach to a sample of some population. Here are examples.
The Practice of Statistics Third Edition Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Copyright © 2008 by W. H. Freeman & Company.
Measures of Central Tendency And Spread Understand the terms mean, median, mode, range, standard deviation.
Research Methods. Measures of Central Tendency You will be familiar with measures of central tendency- averages. Mean Median Mode.
Measures of Central Tendency Foundations of Algebra.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
The use of statistics in psychology. statistics Essential Occasionally misleading.
Experimental Psychology PSY 433 Appendix B Statistics.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
L643: Evaluation of Information Systems Week 13: March, 2008.
Mutation Testing G. Rothermel. Fault-Based Testing White-box and black-box testing techniques use coverage of code or requirements as a “proxy” for designing.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
1 Blend Times in Stirred Tanks Reacting Flows - Lecture 9 Instructor: André Bakker © André Bakker (2006)
Psychology’s Statistics Psychology’s Statistics Appendix (page A1 - A13)
WERST – Methodology Group
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
MEASURES OF DISPERSION 1 Lecture 4. 2 Objectives Explain the importance of measures of dispersion. Compute and interpret the range, the mean deviation,
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Establishing Theoretical Minimal Sets of Mutants ICST 2014 Paul Ammann Joint work with Marcio Eduardo Delamaro Jeff Offutt April 1, 2014.
1 Collecting and Interpreting Quantitative Data Deborah K. van Alphen and Robert W. Lingard California State University, Northridge.
Chapter 12 Inference on the Least-squares Regression Line; ANOVA 12.3 One-way Analysis of Variance.
Psychology’s Statistics Appendix. Statistics Are a means to make data more meaningful Provide a method of organizing information so that it can be understood.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
Lecturer’s desk INTEGRATED LEARNING CENTER ILC 120 Screen Row A Row B Row C Row D Row E Row F Row G Row.
Outline Sampling Measurement Descriptive Statistics:
Statistics In Research
Analyzing the Validity of Selective Mutation with Dominator Mutants
Math 4030 – 10a Tests for Population Mean(s)
Measures of Central Tendency
Introduction to Summary Statistics
Measures of Central Tendency
Fabiano Ferrari Software Engineering Federal University of São Carlos
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Psychology as a Science
George Mason University
Standard Deviation!.
An Analysis of OO Mutation Operators Jingyu Hu, Nan Li, and Jeff Offutt Presented by Nan Li 03/24/2011.
Presentation transcript:

of 17 Assessing the Influence of Multiple Test Case Selection on Mutation Experiments Marcio E. Delamaro and Jeff Offutt George Mason University & Universidade de São Paulo USA & Brazil

of 17 A Recent Experimental Procedure Mutation 2014© Delamaro & Offutt2 PM T Add tests until MS = 100 Creating a “universe” of tests Create mutants

of 17 Experimental Procedure Mutation 2014© Delamaro & Offutt3 T P M op 1 M op2 M op75 T op1 T op2 T op75 M M M “Only one test set? Not good enough!”

of 17 Additional Test Sets Mutation 2014© Delamaro & Offutt4 T P M op 1 M op2 M op75 T op1 -1 T op2 -1 T op75 -1 M M M T op1 -2 T op1 -i T op1 -N T op2 -i T op2 -N T op75 -i T op75 -N

of 17 Multiple Test Sets Mutation 2014© Delamaro & Offutt5 Perceived Benefit Individual test sets may vary in Individual test sets may vary in effectiveness because of the specific values effectiveness because of the specific values Generating N test sets may overcome that Generating N test sets may overcome that variance variance 1) How many test sets are needed? 2) Does additional test sets really help? Does reality match perception?

of 17 Answering the Question Mutation 2014© Delamaro & Offutt6 We decided to answer this question by measuring the performance of each of 10 sets of tests and studying their variances

of 17 Experimental Setup Subjects : 39 C programs –One to 20 functions ( 189 total ) –7 to 390 LOC ( 2853 total ) Mutation 2014© Delamaro & Offutt7 Mutation tool : Proteum –104 to 11,100 mutants ( 66,480 total ) –We used mutation score as a proxy for effectiveness Tests : Hand-constructed test sets to kill all non-equivalent mutants ( the test universe U ) –5 to 142 tests ( 814 total ) –Equivalence determined by hand –3 to 2062 equivalent mutants ( 7829 total )

of 17 Collecting Data Mutation 2014© Delamaro & Offutt8 For each program : 1. Generated statement deletion (SSDL) mutants 2. Created 10 sets of tests to kill all SSDL mutants All tests taken from the universe U Tests picked in random order from U 3. Measured size of each test set 4. Computed MS of each test set on all mutants 5. Collected statistics of distribution and central tendency for each test set mean, median, min, max, standard deviation

of 17 Research Questions Mutation 2014© Delamaro & Offutt9 RQ1 : How different are different SSDL- adequate test sets in terms of mutation score ? RQ2 : How different are different SSDL- adequate test sets in terms of cost (number of tests) ?

of 17 Biggest and Smallest Mutation 2014© Delamaro & Offutt10 ProgramLOCSDMS: Max – Min P P P P P P Average Largest and smallest spreads in mutation scores of SSDL-adequate tests over all mutants

of 17 Program Size vs. Spread Is the spread correlated with the program size ? Spearman rank correlation is used to compare two series of numbers for correlation –1 or -1 means they are perfectly correlated –0 means no correlation Mutation 2014© Delamaro & Offutt11 LOC and SD : -.65 LOC and Max-Min : -.63 Good news for experimentalists … Creating 10 test sets for a 10 line program is easy. Creating 10 test sets for a 1000 line program is impractical ! Strong correlations

of 17 Average Spread Mutation 2014© Delamaro & Offutt12 StatValues Average Minimum.9093 Average Maximum.9338 MS : Max-Min.0245 SD.0071 One-way ANOVA No statistical differences among means

of 17 Threats to Validity Representativeness of programs –Different sources, different domains Size of programs –Most studies of this nature are related to unit testing –Large programs would be impractical Manual steps –Constructing the universe of tests –Identifying equivalent mutants A single comparison point—SSDL mutation –Other criteria could be used We used 10 sets –Would results be different with 5 or 100? Mutation 2014© Delamaro & Offutt13

of 17Conclusions Mutation 2014© Delamaro & Offutt14 Previous researchers assumed selecting only one adequate test set could interfere with results So created multiple test sets But this assumption was made without evidence !!

of 17 Key Findings Mutation 2014© Delamaro & Offutt15 We found significant differences among different test sets For some programs, but not all Differences statistically disappeared when averaged over all 39 programs Differences were less with larger programs

of 17Recommendations Mutation 2014© Delamaro & Offutt16 If only a few, small subjects are used, use multiple test sets If many or larger subjects are used, don’t bother

of 17Contact Mutation 2014© Delamaro & Offutt17 Jeff Offutt Marcio Delamaro