WERST – Methodology Group
Outline Taxonomies of studies Experiment design Reporting studies Research directions
Types of studies Controlled experiments vs. field studies –Advantages: internal validity, duration –Drawbacks: scale, external validity Academic vs. industrial –Academic studies are a good first step: ball park figures, refine hypotheses, additional information Comparing vs. combining techniques –Evaluating single techniques –Choosing one technique out of several alternatives –Ultimate goal should be to combine techniques Human-based studies vs. simulations –Advantage: account for human factors (learning curve, error proneness) –Drawbacks: statistical issues (variation, sample size, etc.), bias
Design of Studies Fault sampling –Mutant generation –Experts seeding faults –Actual project faults Test case selection –Automatic generation versus manual test cases –Sampling from test pools versus humans –Guidance to human subjects: maximum benefit versus expected benefit Selection of subject programs –Characterize, classify, describe subjects –Relevant aspects? –Concurrency, distribution, embedded, PL, dev. Methodology, complexity and size
Design of studies II Baselines of comparisons in empirical studies –“Random” selection of test sets –Current practice –Alternative techniques Statistical variation in the performance of testing techniques –Techniques and criteria are not deterministic –Human factors –Location, type of faults, subject programs –Context: severity, risk
Reporting Studies Subject programs: Concurrency, distribution, embedded, PL, dev. Methodology, complexity and size Fault sampling: selection or seeding procedure Human participants: training, background Data collection procedures, e.g., effort Experiment design: –Hypotheses –Human: groups, task assignments, order of execution –Simulation: Test pools, procedure for deriving test sets from pool Threats to validity
Conclusions Properly reporting studies is key Decide on proper designs for replicability and meta-analyses Need for multiple kinds of studies to answer a research question
Future Directions Mutants versus real faults: What is the relationship? Guidelines and templates for conducting empirical studies: Industrial, academic –Group assignments, training –Data collection: metrics and procedures –Fault sampling –Subject selection –Statistical analysis: statistical inference testing, meta-analysis –Qualitative analysis, e.g., faults Guidelines and templates for reporting empirical studies: Industrial, academic –How to ensure replicability?
Proposals White paper on how to perform and report empirical studies of software testing Web repositories Competitions on benchmarks Software testing questions: –Cost-effectiveness: How to measure the cost and benefits of testing? –Model-driven test automation and strategies –Tailoring techniques to specifics of development processes and application domains –Prioritizing black-box, system level regression test cases –Correlation/Synergy among test techniques (faults) Meta-questions –Generalization / prediction –Appropriate taxonomies of faults –Consistency of results between lab and industrial contexts –Improve the relevance of mutation systems