An Evaluation of Mutation and Data-flow Testing A Meta Analysis Sahitya Kakarla AdVanced Empirical Software Testing and Analysis (AVESTA) Department of.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
EVAL 6970: Meta-Analysis Vote Counting, The Sign Test, Power, Publication Bias, and Outliers Dr. Chris L. S. Coryn Spring 2011.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
Paraμ A Partial and Higher-Order Mutation Tool with Concurrency Operators Pratyusha Madiraju AdVanced Empirical Software Testing and Analysis (AVESTA)
Departments of Medicine and Biostatistics
Effect Size and Meta-Analysis
Introduction to Meta-Analysis Joseph Stevens, Ph.D., University of Oregon (541) , © Stevens 2006.
Chapter 13 Multiple Regression
Multiple regression analysis
Chapter 12 Multiple Regression
Additional Topics in Regression Analysis
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 14 1 MER301: Engineering Reliability LECTURE 14: Chapter 7: Design of Engineering.
An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.
Today Concepts underlying inferential statistics
Correlational Designs
Chapter 7 Correlational Research Gay, Mills, and Airasian
Correlation and Regression Analysis
Chapter 14 Inferential Data Analysis
Richard M. Jacobs, OSA, Ph.D.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Fig Theory construction. A good theory will generate a host of testable hypotheses. In a typical study, only one or a few of these hypotheses can.
Are the results valid? Was the validity of the included studies appraised?
Inference for regression - Simple linear regression
Daniel Acuña Outline What is it? Statistical significance, sample size, hypothesis support and publication Evidence for publication bias: Due.
Funded through the ESRC’s Researcher Development Initiative
Advanced Statistics for Researchers Meta-analysis and Systematic Review Avoiding bias in literature review and calculating effect sizes Dr. Chris Rakes.
Program Evaluation. Program evaluation Methodological techniques of the social sciences social policy public welfare administration.
Guide to Handling Missing Information Contacting researchers Algebraic recalculations, conversions and approximations Imputation method (substituting missing.
The Effect of Computers on Student Writing: A Meta-Analysis of Studies from 1992 to 2002 Amie Goldberg, Michael Russell, & Abigail Cook Technology and.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
The Scientific Method Formulation of an H ypothesis P lanning an experiment to objectively test the hypothesis Careful observation and collection of D.
Statistical Applications for Meta-Analysis Robert M. Bernard Centre for the Study of Learning and Performance and CanKnow Concordia University December.
IMPRINT Developer’s Workshop December 6-7, 2005 Meta-analytic Reviews of the Effects of Temperature and Vibration on Performance J.L. Szalma & G. Conway.
Meta-analysis and “statistical aggregation” Dave Thompson Dept. of Biostatistics and Epidemiology College of Public Health, OUHSC Learning to Practice.
Meta-analysis 統合分析 蔡崇弘. EBM ( evidence based medicine) Ask Acquire Appraising Apply Audit.
Power Point Slides by Ronald J. Shope in collaboration with John W. Creswell Chapter 12 Correlational Designs.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
The Campbell Collaborationwww.campbellcollaboration.org C2 Training: May 9 – 10, 2011 Introduction to meta-analysis.
Chapter 16 Data Analysis: Testing for Associations.
Lab 9: Two Group Comparisons. Today’s Activities - Evaluating and interpreting differences across groups – Effect sizes Gender differences examples Class.
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Retain H o Refute hypothesis and model MODELS Explanations or Theories OBSERVATIONS Pattern in Space or Time HYPOTHESIS Predictions based on model NULL.
© (2015, 2012, 2008) by Pearson Education, Inc. All Rights Reserved Chapter 11: Correlational Designs Educational Research: Planning, Conducting, and Evaluating.
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
1 Lecture 10: Meta-analysis of intervention studies Introduction to meta-analysis Selection of studies Abstraction of information Quality scores Methods.
Systematic Synthesis of the Literature: Introduction to Meta-analysis Linda N. Meurer, MD, MPH Department of Family and Community Medicine.
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
Tutorial I: Missing Value Analysis
Course: Research in Biomedicine and Health III Seminar 5: Critical assessment of evidence.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
1 Lecture 10: Meta-analysis of intervention studies Introduction to meta-analysis Selection of studies Abstraction of information Quality scores Methods.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 27 Systematic Reviews of Research Evidence: Meta-Analysis, Metasynthesis,
Chapter 11 Meta-Analysis. Meta-analysis  Quantitative means of reanalyzing the results from a large number of research studies in an attempt to synthesize.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Chapter 22 Inferential Data Analysis: Part 2 PowerPoint presentation developed by: Jennifer L. Bellamy & Sarah E. Bledsoe.
Week Seven.  The systematic and rigorous integration and synthesis of evidence is a cornerstone of EBP  Impossible to develop “best practice” guidelines,
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
CHAPTER 15: THE NUTS AND BOLTS OF USING STATISTICS.
Chapter 14 Introduction to Multiple Regression
Heterogeneity and sources of bias
Lecture 4: Meta-analysis
Power, Sample Size, & Effect Size:
Gerald Dyer, Jr., MPH October 20, 2016
Narrative Reviews Limitations: Subjectivity inherent:
Meta-analysis, systematic reviews and research syntheses
Presentation transcript:

An Evaluation of Mutation and Data-flow Testing A Meta Analysis Sahitya Kakarla AdVanced Empirical Software Testing and Analysis (AVESTA) Department of Computer Science Texas Tech University, USA Selina Momotaz AdVanced Empirical Software Testing and Analysis (AVESTA) Department of Computer Science Texas Tech University, USA Akbar Siami Namin AdVanced Empirical Software Testing and Analysis (AVESTA) Department of Computer Science Texas Tech University, USA The 6 th International Workshop on Mutation Analysis (Mutation 2011) Berlin, Germany, March 2011

2  Motivation  What we do/don’t know about mutation and Data- flow?  Research synthesis methods  Research synthesis in software engineering  Mutation vs. Data-flow testing  A meta-analytical assessment  Discussion  Conclusion  Future work Outline

3  We already know[1, 2, 3]:  Mutation testing detects more faults than data-flow testing  Mutation adequate test suites are larger than data- flow adequate test suites Motivation What We Already Know? [1] A.P. Mathur, W.E. Wong, “An empirical comparison of data flow and mutation-based adequacy criteria,” Software Testing, Verification, and Reliability, 1994 [2] A.J.Offutt, J. Pan, K. Tewary, and T. Zhang, “An experimental evaluation of dataflow and mutation testing,” Software Practice and Experience, 1996 [3] P.G. Frankl, S. N. Weiss, and C. Hu, “All-uses vs. mutation testing: An experimental comparison of effectiveness,” Journal of Systems and Software

4  However, we don’t know!!!  The magnitude order of fault detection ratio between mutation and data-flow testing  The magnitude order of test suite size between mutation and data-flow adequacy testing Motivation What We Don’t Know?

5  How about: 1.Taking the average of the number of faults detected by mutation technique 2.Taking the average of the number of faults detected by data-flow technique 3.Compute any of these: Computing the mean differences Computing the odds Motivation What Can We Do?

6  Similarly, for adequate test suites and their sizes: 1.Taking the average of the number of faults detected by mutation technique 2.Taking the average of the number of faults detected by data-flow technique 3.Compute any of these: Computing the mean differences Computing the odds Motivation What We Can Do?

7  The mean differences and odds are two measures for quantifying differences between techniques as reported in experimental studies.  More precisely!  The mean differences and odds are two techniques of quantitative research synthesis  In addition to quantitative approaches  There are qualitative techniques for synthesizing research through experimental studies  meta-ethnography, qualitative meta-analysis, interpretive synthesis, narrative synthesis, and qualitative systematic review Motivation In Fact…

8  A quantitative approach using meta-analysis to assess the differences between mutation and data-flow testing based on the results already reported in the literature [1, 2, 3] and with respect to:  Effectiveness  The number of faults detected by each technique  Efficiency  The number of test cases required to build an adequate (mutant | data-flow) test suite Motivation The Objectives of This Research Paper [1] A.P. Mathur, W.E. Wong, “An empirical comparison of data flow and mutation-based adequacy criteria,” Software Testing, Verification, and Reliability, 1994 [2] A.J.Offutt, J. Pan, K. Tewary, and T. Zhang, “An experimental evaluation of dataflow and mutation testing,” Software Practice and Experience, 1996 [3] P.G. Frankl, S. N. Weiss, and C. Hu, “All-uses vs. mutation testing: An experimental comparison of effectiveness,” Journal of Systems and Software

9  Two major methods  Narrative reviews  Vote counting  Statistical research syntheses  Meta-analysis  Other methods  Qualitative syntheses of qualitative and quantitative research  etc. Research Synthesis Methods

10  Often inconclusive when compared to statistical approaches for systematic reviews  Use “vote counting” method to determine if an effect exists  Findings are divided into three categories 1.Those with statistically significant results in one direction 2.Those with statistically significant results in the opposite direction 3.Those with statistically insignificant results Very common in medical sciences Research Synthesis Methods Narrative Reviews

11  Major problems  Gives equal weights to studies with different sample sizes and effect sizes at varying significant levels  Misleading conclusions  No notion of determination of the size of the effect  Often fail to identify the variables, or study characteristics Research Synthesis Methods Narrative Reviews (Con’t)

12  A quantitative integration and analysis of the findings from all the empirical studies relevant to an issue  Quantifies the effect of a treatment  Identifies potential moderator variables of the effect  Factors the may influence the relationship  Findings from different studies are expressed in terms of a common metric called “effect size”  Standardization towards a meaningful comparison Research Synthesis Methods Statistical Research Syntheses

13  Effect size  The difference between the means of the experimental and control conditions divided by the standard deviation (Glass, 1976) Research Synthesis Methods Statistical Research Syntheses – Effect Size [Cohen’s d] [Pooled Standard Deviation]

14  Advantages over narrative reviews  Shows the direction of the effect  Quantifies the effect  Identifies the moderator variables  Allows computation of weights for studies Research Synthesis Methods Statistical Research Syntheses (Con’t)

15  The statistical analysis of a large collection of analysis results for the purpose of integrating the findings (Glass, 1976)  Generally centered on the relation between one explanatory and one response variable  The effect of X on Y Research Synthesis Methods Meta-Analysis

16 1.Define the theoretical relation of interest 2.Collect the population of studies that provide data on the relation 3.Code the studies and compute effect sizes Standardize the measurements reported in the articles Decide on coding protocol to specify the information to be extracted from each study 4.Examine the distribution of effect sizes and analyze the impact of moderating variables 5.Interpret and report the results Research Synthesis Methods Steps to Perform a Meta-Analysis

17 Research Synthesis Methods Criticisms of Meta-Analysis  These problems are in common with narrative reviews  Add and compare apples and oranges  Ignore qualitative differences between studies  A Garbage-in, garbage-out procedure  Consider only significant findings which are published

18  There is no clear understanding on what a representative sample of programs looks like!  The results of experimental studies are often incomparable  Different settings  Different metrics  Inadequate information  Lack of interest in replication of experimental studies  Lower acceptance rate for replicated studies  Unless the results obtained are significantly different  Publication Bias Research Synthesis in Software Eng. The Major Problems

19  Miller, 1998  Applied meta-analysis for assessing functional and structural testing  Succi, 2000  A study on weighted estimator of a common correlation technique for meta-analysis in software engineering  Manso, 2008  Applied meta-analysis for empirical validation of UML class diagrams Research Synthesis in Software Eng. Only a Few Studies

20  Three papers were selected and coded  A.P. Mathur, W.E. Wong, “An empirical comparison of data flow and mutation-based adequacy criteria,” Software Testing, Verification, and Reliability, 1994  A.J.Offutt, J. Pan, K. Tewary, and T. Zhang, “An experimental evaluation of dataflow and mutation testing,” Software Practice and Experience, 1996  P.G. Frankl, S. N. Weiss, and C. Hu, “All-uses vs. mutation testing: An experimental comparison of effectiveness,” Journal of Systems and Software Mutation vs. Data-flow Testing A Meta-Analytical Assessment

21  A.P. Mathur, W.E. Wong, “An empirical comparison of data flow and mutation-based adequacy criteria,” Software Testing, Verification, and Reliability, 1994 Mutation vs. Data-flow Testing A Meta-Analytical Assessment

22  A.J.Offutt, J. Pan, K. Tewary, and T. Zhang, “An experimental evaluation of dataflow and mutation testing,” Software Practice and Experience, 1996 Mutation vs. Data-flow Testing A Meta-Analytical Assessment

23  P.G. Frankl, S. N. Weiss, and C. Hu, “All-uses vs. mutation testing: An experimental comparison of effectiveness,” Journal of Systems and Software Mutation vs. Data-flow Testing A Meta-Analytical Assessment

24 Mutation vs. Data-flow Testing The Moderator Variables VariableDescription LOCLines of code No. FaultsNumber of faults used NMNumber of mutants generated NEXNumber of executable def-use pairs NTCNumber of test cases required for achieving adequacy PROProportion of test cases detecting faults OR Proportion of faults detected

25 Mutation vs. Data-flow Testing The Result of Coding Study ReferenceLanguageLOCNo. Faults Mathur & Wong, 1994 Fortran/C~ 40NA Offutt et al., 1996 Fortran/C~ 1860 Frankl et al., 1997 Fortran/Pascal~ 39NA Study ReferenceNo. MutantsNo. test casesProportion Mathur & Wong, 1994 ~ 954~ 22NA Offutt et al., 1996 ~ 667~ 18~ 92% Frankl et al., 1997 ~ 1812~ 63.6~ 69% Study ReferenceNo. Executable def-use No. test casesProportion Mathur & Wong, 1994 ~ 72~ 6.6NA Offutt et al., 1996 ~ 40~ 4~ 76% Frankl et al., 1997 ~ 73~ 50.3~ 58%

26  The inverse variance method was used  Average effect size across all studies is used as “weighted mean”  Larger studies with less variation weigh more  i : the i-th study  : the estimated between-study variance  : the estimated within-study variance for the i-th study Mutation vs. Data-flow Testing The Meta-Analysis Technique Used

27  The inverse variance method  As defined in Mantel-Haenszel technique  Use a weighted average of the individual study effects as effect size Mutation vs. Data-flow Testing The Meta-Analysis Technique Used

28  Efficiency (to avoid negative odds ratio)  Control group: data-flow data group  Treatment group: mutation data group  Effectiveness (to avoid negative odds ratio)  Control group : mutation data group  Treatment group : data-flow data group Mutation vs. Data-flow Testing Treatment & Control Groups

29 Mutation vs. Data-flow Testing The Odds Ratios Computed Study Reference Estimated Variance Study WeightOdds Ratio OR 95% CIEffect Size log(OR) Mathur & Wong, (1.59, 10.02)1.383 Offutt et al., (1.71, 16.19)1.662 Frankl et al., (0.98, 3.04)0.548 Fixed-- 2.6(1.69, 4)0.955 Random (1.43, 6.03)1.078 Study Reference Estimated Variance Study WeightOdds Ratio OR 95% CIEffect Size log(OR) Offutt et al., (1.54, 8.55)1.289 Frankl et al., (0.90, 2.88)0.476 Fixed (1.32, 3.41 )0.751 Random (1.03, 4.99)0.819 Cohen’s scaling: up to 0.2, 0.5, and 0.8: Small, Medium, Large

30 Mutation vs. Data-flow Testing The Forest Plots

31  We need to test whether the variation in the effects computed is due to randomness only  Testing the homogeneity of the studies  Cochrane chi-square test or Q-test  High Q rejects the hypothesis that the studies are homogeneous (null hypothesis)  Q = 4.37 with p-value =  No evidence to reject the null hypothesis  Funnel plots – A symmetric plot indicates that the homogeneity of studies is maintained Mutation vs. Data-flow Testing Homogeneity & Publication Bias

32 Mutation vs. Data-flow Testing Publication Bias - Funnel Plots

33  Examining how the factors (moderator variables) affect the observed effect sizes in the studies chosen  Apply weighted linear regressions  Weights are the study weights computed for each study references  The moderator variables in our studies  Number of mutants (No.Mut)  Number of executable data-flow coverage elements (e.g. def-use) (No.Exe) Mutation vs. Data-flow Testing A Meta-Regression on Efficiency

34  A meta-regression on efficiency  The number of predictors (three)  The intercept  The number of mutants (No.Mut)  The number of executable coverage elements (No.Exe)  The number of observations  Three papers  # predictors = # observations  Not possible to fit a linear regression with an intercept  Possible to fit a linear regression without an intercept Mutation vs. Data-flow Testing A Meta-Regression on Efficiency

35  The p-values are considerably larger than 0.05  No evidence to believe that the No.Mut and No.Exc have significant influence on the effect size Mutation vs. Data-flow Testing A Meta-Regression on Efficiency CoefficientsEstimate d Values Standard Error t- value p-value No. Mutants No. Executable def-use pairs SummaryStatistics Residual Standard Error0.652 Multiple R-Squared0.959 Adjusted R-Squared0.877 F-Statistics11.73 p-value0.202

36 Mutation vs. Data-flow Testing A Meta-Regression on Effectiveness  A meta-regression on effectiveness  The number of predictors (three)  The intercept  The number of mutants (No.Mut)  The number of executable coverage elements (No.Exe)  The number of observations  Two papers  # predictors > # observations  Not possible to fit a linear regression (with or without intercept)

37  A meta-analytical assessment of mutation and data-flow testing  Mutation is at least two times more effective than data-flow testing  Odds ratio = 2.27  Mutation is almost three times less efficient than data-flow testing  Odd ratio = 2.94  No evidence to believe that the number of mutants or the number of executable coverage elements have any influence on the size effect Conclusion

38  We missed two related papers!!  Offut and Tewary, “Empirical comparison of data- flow and mutation testing”, 1992  N. Li, U. Praphamontripong, and J. Offutt, “An experimental comparison of four unit test criteria: Mutation, edge-pair, all-uses, and prime path coverage,” Mutation 2009, DC, USA  A group of my students are conducting (replicating) an experiment for Java similar to the above paper.  Further replications are required  Applications of other meta-analysis measurements, e.g. Cohen d, Hedge g, etc. may be of interest Future Work

39 Thank You The 6 th International Workshop on Mutation Analysis (Mutation 2011) Berlin, Germany, March 2011