August Shi, Wing Lam, Reed Oei, Tao Xie, Darko Marinov

Slides:

Advertisements

Similar presentations

Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.

Advertisements

Test-First Programming. The tests should drive you to write the code, the reason you write code is to get a test to succeed, and you should only write.

Applications of Synchronization Coverage A.Bron,E.Farchi, Y.Magid,Y.Nir,S.Ur Tehila Mayzels 1.

Mahadevan Subramaniam and Bo Guo University of Nebraska at Omaha An Approach for Selecting Tests with Provable Guarantees.

INTRODUCTION Chapter 1 1. Java CPSC 1100 University of Tennessee at Chattanooga 2  Difference between Visual Logic & Java  Lots  Visual Logic Flowcharts.

An Introduction to Java Programming and Object- Oriented Application Development Chapter 8 Exceptions and Assertions.

Chapter 16: Exception Handling C++ Programming: From Problem Analysis to Program Design, Fifth Edition.

Programming Types of Testing.

Developer Testing and Debugging. Resources Code Complete by Steve McConnell Code Complete by Steve McConnell Safari Books Online Safari Books Online Google.

DSPIN: Detecting Automatically Spun Content on the Web Qing Zhang, David Y. Wang, Geoffrey M. Voelker University of California, San Diego 1.

CMSC 345, Version 11/07 SD Vick from S. Mitchell Software Testing.

Well-behaved objects 4.0 Testing. 2 Objects First with Java - A Practical Introduction using BlueJ, © David J. Barnes, Michael Kölling Main concepts to.

An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.

Delta Debugging - Demo Presented by: Xia Cheng. Motivation Automation is difficult Automation is difficult fail analysis needs complete understanding.

CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.

Unit Testing & Defensive Programming. F-22 Raptor Fighter.

Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.

Introduction to Unit Testing Jun-Ru Chang 2012/05/03.

Java Programming, 3e Concepts and Techniques Chapter 2 - Part 2 Creating a Java Application and Applet.

Software Construction Lecture 18 Software Testing.

Jack DeWeese Computer Systems Research Lab. Purpose  Originally intended to create my own simulation with easily modified variables  Halfway through.

Unit Testing with JUnit and Clover Based on material from: Daniel Amyot JUnit Web site.

1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng

Week 14 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.

Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.

Directed Random Testing Evaluation. FDRT evaluation: high-level – Evaluate coverage and error-detection ability large, real, and stable libraries tot.

JUnit Don Braffitt Updated: 10-Jun-2011.

Computer Science 1 Test Selection and Augmentation of Regression System Tests for Security Policy Evolution JeeHyun Hwang, Tao Xie, and collaborators at.

Software testing techniques Software testing techniques Software Testability Presentation on the seminar Kaunas University of Technology.

1 Unit Testing with JUnit CS 3331 JUnit website at Kent Beck and Eric Gamma. Test Infected: Programmers Love Writing Tests, Java Report,

Chapter 15: Exception Handling C++ Programming: Program Design Including Data Structures, Fifth Edition.

PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

Today protected access modifier Using the debugger in Eclipse JUnit testing TDD Winter 2016CMPE212 - Prof. McLeod1.

Detecting Assumptions on Deterministic Implementations of Non-deterministic Specifications August Shi, Alex Gyori, Owolabi Legunsen, Darko Marinov 4/12/2016.

Test Case Purification for Improving Fault Localization presented by Taehoon Kwak SoftWare Testing & Verification Group Jifeng Xuan, Martin Monperrus [FSE’14]

Cs498dm Software Testing Darko Marinov January 24, 2012.

ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.

CS520 Web Programming Bits and Pieces of Web Programming (I) Chengyu Sun California State University, Los Angeles.

Testing and Debugging UCT Department of Computer Science Computer Science 1015F Hussein Suleman March 2009.

Don Braffitt Updated: 26-Mar-2013

Unit Testing with JUnit

CS 440 Database Management Systems

Containers and Lists CIS 40 – Introduction to Programming in Python

As the last CC-list represents Maximum Compatible Classes we conclude:

Testing and Debugging.

Towards Trustworthy Program Repair

Coding Defensively Coding Defensively

Preliminary Analysis of Contestant Performance for a Code Hunt Contest

Ask the Mutants: Mutating Faulty Programs for Fault Localization

Chapter 14: Exception Handling

Mutation Testing Meets Approximate Computing

White-Box Testing Using Pex

Functions Inputs Output

Balancing Trade-Offs in Test-Suite Reduction

Alex Groce, Josie Holmes, Darko Marinov, August Shi, Lingming Zhang

Topics Introduction to File Input and Output

August Shi, Tifany Yung, Alex Gyori, and Darko Marinov

Test Case Purification for Improving Fault Localization

Exercise 11.1 Write a code fragment that performs the same function as the statement below without using the crash method Toolbox.crash(amount < 0,

Masatomo Hashimoto Akira Mori Tomonori Izumida

Testing Acknowledgement: Original slides by Jory Denny.

Introduction Previous work Test Suite Minimization

CISC101 Reminders Assignment 3 due next Friday. Winter 2019

Precise Condition Synthesis for Program Repair

Topics Introduction to File Input and Output

By Hyunsook Do, Sebastian Elbaum, Gregg Rothermel

Exceptions and networking

Multi VO Rucio Andrew Lister.

Mitigating the Effects of Flaky Tests on Mutation Testing

Presentation transcript:

August Shi, Wing Lam, Reed Oei, Tao Xie, Darko Marinov iFixFlakies: A Framework for Automatically Fixing Order-Dependent Flaky Tests August Shi, Wing Lam, Reed Oei, Tao Xie, Darko Marinov ESEC/FSE 2019 Tallinn, Estonia 8/29/2019 CNS-1513939 CNS-1564274 CNS-1646305 CNS-1740916 CCF-1763788 CCF-1816615 OAC-1839010

Order-Dependent (OD) Flaky Tests Cannot always have same order JUnit tests ran in different order after upgrading Java 6 to 71,2,3 Code Under Test Regression testing techniques (prioritization, selection, etc) can lead to different orders test0 test0 test0 test1 test2 test2 test2 test1 test1 … … … testn testn testn 1http://intellijava.blogspot.com/2012/05/junit-and-java-7.html 2http://www.java-allandsundry.com/2013/01/ 3https://coderanch.com/t/600985/engineering/Maintaining-order-JUnit-tests-JDK

Example 1 (from WildFly) public class WritableServiceBasedNamingStoreTestCase { @Test public void testBind() throws Exception { final Name name = new CompositeName(“test”); final Object value = new Object(); ... assertEquals(value, store.lookup(name)); } @Test public void testPermissions() throws Exception { final String name = “a/b”; store.bind(new CompositeName(name), value); State-Setter Makes brittle pass if run before State-setter has a fix! Brittle Fails if run before state-setter <DEFINE “FIX”> …While this seems to be a relatively simple case where we can directly fix with the relevant state-setter, there is a more complicated case testBind How to Fix? testPermissions testPermissions testBind

Example 2 (from ElasticJob) public class ShutdownListenerManagerTest { @Test public void assertIsShutdownAlready() { shutdownListenerManager.new InstanceShutdownStatusJobListener() dataChanged((“/test_job/instances/127.0.0.1 @ - @0”, ...)); verify(schedulerFacade, times(0)).shutdownInstance(); } @Test public void assertRemoveLocalInstancePath() { JobRegistry.getInstance().registerJob(“test_job”, …); shutdownListenerManager.new InstanceShutdownStatusJobListener(). verify(schedulerFacade).shutdownInstance(); Victim Fails if run after polluter Polluter Makes victim fail if run before assertIsShutdownAlready assertRemoveLocalInstancePath How to Fix? assertRemoveLocalInstancePath assertIsShutdownAlready

A Fix is also in the Test Suite! Cleaner Makes victim pass when run between polluter and victim public class FailoverServiceTest { @Test public void assertGetFailoverItems() { JobRegistry.getInstance().registerJob(“test_job”, ...); when(jobNodeStorage.getJobNodeChildrenKeys(...)).thenReturn(...)); when(jobNodeStorage.isJobNodeExisted(...)).thenReturn(true); when(jobNodeStorage.isJobNodeExisted(...)).thenReturn(false); when(jobNodeStorage.getJobNodeDataDirectly(...)).thenReturn(...); assertThat(...); verify(jobNodeStorage).getJobNodeChildrenKeys(...); verify(jobNodeStorage).isJobNodeExisted(...); verify(jobNodeStorage).getJobNodeDataDirectly(...); JobRegistry.getInstance().shutdown(“test_job”); } <Practice animation sync up of arrow with speaking> ShutdownListenerManagerTest. assertRemoveLocalInstancePath FailoverServiceTest. assertGetFailoverItems ShutdownListenerManagerTest. assertIsShutdownAlready ShutdownListenerManagerTest. assertIsShutdownAlready

iFixFlakies Test suites often already have logic for setting/resetting state for OD tests! We can fix OD tests by using the code the test suites already have iFixFlakies automates this process https://sites.google.com/view/ifixflakies

iFixFlakies Overview iFixFlakies test0 test1 test2 test0 test2 test1 Passing Order test0 test1 test2 Failing Order <at least one of each order> test0 test2 test1 Patches

iFixFlakies Overview iFixFlakies test0 test1 test2 test0 test2 test1 Passing Order Minimizer Patcher test0 test1 test2 Failing Order test0 test2 test1 Patches

Minimizer Overview Input: A passing order, a failing order Determine OD test kind by running the OD test in isolation Passing means victim, failing means brittle Find polluters from failing order, state- setters from passing order Find cleaner for polluter/victim pairs Output: Polluters/cleaners/state-setters

Minimizer Overview Input: A passing order, a failing order Determine OD test kind by running the OD test in isolation Passing means victim, failing means brittle Find polluters from failing order, state- setters from passing order Find cleaner for polluter/victim pairs Output: Polluters/cleaners/state-setters

Minimizer (Determine OD Test Kind) V T3

Minimizer (Find Polluter) V T1 T0 T2 V T2 V T1 T0 Polluter P V T1 Can be configured to find multiple polluters

Minimizer (Find Cleaner) V P T0 T2 T4 All tests are candidate cleaners All tests are candidate cleaners to try, because it is very important to have them for iFixFlakies to work!

Minimizer (Find Cleaner) V P T0 T2 T4 V T0 P V P V T2 P Cleaner V P V T4 P C

Patcher Overview Input: Polluters/cleaners/state-setters Make copy of cleaner/state-setter as new helper method, to be minimized Also include setup/teardown code Add call to helper method in OD test Delta debug statements in helper method until OD test passes Attempt to inline minimized statements into the OD test Output: One patch per cleaner/state-setter

Patcher Example public class FailoverServiceTest { @Test public void assertGetFailoverItems() { JobRegistry.getInstance().registerJob(“test_job”, ...); when(jobNodeStorage.getJobNodeChildrenKeys(...)).thenReturn(...)); when(jobNodeStorage.isJobNodeExisted(...)).thenReturn(true); when(jobNodeStorage.isJobNodeExisted(...)).thenReturn(false); when(jobNodeStorage.getJobNodeDataDirectly(...)).thenReturn(...); assertThat(...); verify(jobNodeStorage).getJobNodeChildrenKeys(...); verify(jobNodeStorage).isJobNodeExisted(...); verify(jobNodeStorage).getJobNodeDataDirectly(...); JobRegistry.getInstance().shutdown(“test_job”); }

Patcher Example Try to inline this statement public class FailoverServiceTest { public void helper() { JobRegistry.getInstance().registerJob(“test_job”, ...); when(jobNodeStorage.getJobNodeChildrenKeys(...)).thenReturn(...)); when(jobNodeStorage.isJobNodeExisted(...)).thenReturn(true); when(jobNodeStorage.isJobNodeExisted(...)).thenReturn(false); when(jobNodeStorage.getJobNodeDataDirectly(...)).thenReturn(...); assertThat(...); verify(jobNodeStorage).getJobNodeChildrenKeys(...); verify(jobNodeStorage).isJobNodeExisted(...); verify(jobNodeStorage).getJobNodeDataDirectly(...); JobRegistry.getInstance().shutdown(“test_job”); } <Not compiling is also reason for not accepting code in the middle> Try to inline this statement public class ShutdownListenerManagerTest { @Test public void assertIsShutdownAlready() { new FailoverServiceTest().helper(); ... }} assertRemoveLocalInstancePath public class ShutdownListenerManagerTest { @Test public void assertIsShutdownAlready() { JobRegistry.getInstance().shutdown(“test_job”); ... }} assertIsShutdownAlready helper + assertIsShutdownAlready helper + assertIsShutdownAlready

Experimental Setup OD tests taken from public dataset collected in our prior work1 10 projects, 110 OD tests RQ1: Breakdown of kinds of OD tests? RQ2: Characteristics of patches? RQ3: Running time of iFixFlakies? 1Lam et al., “iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests”. ICST 2019.

Over half can be automatically fixed! RQ1: Breakdowns Project - Module # OD Tests # Victims # Brittles # Victims w/ Cleaners alibaba/fastjson 11 4 7 1 apache/incubator-dubbo - m1 - m2 3 - m3 - m4 apache/jackrabbit-oak 2 apache/struts dropwizard/dropwizard elasticjob/elastic-job-lite 6 5 jfree/jfreechart kevinsawicki/http-request 28 undertow-io/undertow wildfly/wildfly 44 43 Total/Average 110 100 10 48 Over half can be automatically fixed!

RQ2: Patch Characteristics # Patches # Unique Patches Average 25.2 3.3 Many cleaners/state-setters result in the same patch First Patch All Patches Avg. # Stmts Avg. % Stmts from Original Average 1.9 22.6% 1.8 24.6% Final patches are small parts of the original cleaner/state-setter test (69.5% of patches consist of only one statement!) First patch obtained is similar in size to all others Recommendation: Only generate a handful of patches

RQ2: Patches as Pull Requests Submitted pull requests for 56 of 58 OD tests 2 of 58 already fixed before our work Pull requests for 21 of 56 tests accepted so far Remaining pending, nothing rejected A patch for a brittle in WildFly also fixed 43 victims that have no cleaners!

RQ3: Runtime of iFixFlakies * = spent time determining no cleaners Project - Module Test suite time (s) Avg. time to find first (s) polluter cleaner state-setter patch alibaba/fastjson 203 92 *523 42 299 apache/incubator-dubbo - m1 8 22 52 n/a 294 - m2 206 143 178 21 130 - m3 1 2 4 395 - m4 3 19 *104 apache/jackrabbit-oak 189 218 *5,710 50 416 apache/struts 13 11 411 dropwizard/dropwizard 7 36 16 714 elasticjob/elastic-job-lite 24 45 *293 258 jfree/jfreechart *1,695 kevinsawicki/http-request 15 5 56 undertow-io/undertow 17 32 79 232 wildfly/wildfly *114 463 Total/Average 35 29 176 37 186

RQ3: Runtime of iFixFlakies (cont’d) * = spent time determining no cleaners Project - Module Test suite time (s) Avg. time to find all (s) polluters cleaners state-setters patches alibaba/fastjson 203 113 *1,443 1,748 25,424 apache/incubator-dubbo - m1 8 48 465 n/a 2,473 - m2 206 389 7,089 39 54,856 - m3 1 2 25 572 - m4 3 19 *104 apache/jackrabbit-oak 189 218 *5,710 50 416 apache/struts 4 13 159 14,478 dropwizard/dropwizard 7 57 589 4,960 elasticjob/elastic-job-lite 24 45 *1,434 11,854 jfree/jfreechart 22 16 *1,695 kevinsawicki/http-request 15 227 56 undertow-io/undertow 17 109 1,025 2,617 wildfly/wildfly 21 *114 186 4,749 Total/Average 35 592 1,154 9,737 Recommendation: Do not spend time searching for all patches

Conclusions Order-dependent (OD) tests can often be automatically fixed Test suites often have logic that can fix them! iFixFlakies can automatically find and minimize the fixes for these tests Automatically fixed 58 out of 110 OD tests Pull requests for 21 of 56 OD tests accepted, remaining are pending, nothing rejected awshi2@illinois.edu https://sites.google.com/view/ifixflakies

I will be on the job market! August Shi http://mir.cs.illinois.edu/awshi2/

BACKUP

RQ1

RQ2

RQ2 cont.

RQ3