Unit Testing in Java with an Emphasis on Concurrency Corky Cartwright Rice and Halmstad Universities Summer 2013
Software Engineering Culture ● Three Guiding Visions ● Data-driven design ● Test-driven development ● Mostly functional coding (no gratuitous mutation) ● Codified in Design Recipe taught in How to Design Programs by Felleisen et al (available for free online: [first edition], [second edition]) and Elements of Object-Oriented Design (available online at. The target languages are Scheme and Java.
Moore’s Law
Extrapolate the Future
Timeliness ● CPU clock frequencies stagnate ● Multi-Core CPUs provide additional processing power, but multiple threads needed to use multiple cores. ● Writing concurrent programs is difficult!
Tutorial Outline ● Introduce unit testing in single-threaded (deterministic) setting using lists ● Demonstrate problems introduced by concurrency and their impact on unit testing ● Show how some of the most basic problems can be overcome by using the right policies and tools.
(Sequential) Unit Testing Unit tests … Test parts of the program (including whole!) Integrate with program development; commits to repository must pass all unit tests Automate testing during maintenance phase Serve as documentation Prevent bugs from reoccurring Help keep the code repository clean Effective with a single thread of control
Universal Test-Driven Design Recipe Analyze the problem: define the data and determine top level operations. Give sample data values. Define type signatures, contracts, and headers for all top level operations. In Java, the type signature is part of the header. Give input-output examples including critical boundary cases for each operation. Write a template for each operation, typically based on structural decomposition of primary argument (the receiver in OO methods). Code each method by filling in templates Test every method (using I/O examples!) and ascertain that every method is tested on sufficient set of examples. White-box testing matters!
Sequential Case Studies: Functional Lists and Bi-Lists A List is either Empty (), or Cons (e, l) where e is an E and l is List Cons (e, l) where e is an E and l is List A BiList is a mutable data structure containing a possibly empty sequence of objects of type E that can be traversed in either direction using a BiListIterator.
Review Elements of Sequential Unit Testing ● Unit tests depend on deterministic behavior ● Known input, expected output… Success correct behavior Failure flawed code ● Outcome of test is meaningful if test is deterministic
Problems Due to Concurrency Thread scheduling is nondeterministic and machine-dependent Code may be executed under different schedules Different schedules may produce different results Known input, expected output(s?)… Success correct behavior in this schedule, may be flawed in other schedule Failure flawed code Success of unit test is meaningless
Recommended Resources on Concurrent Programming in Java Explicit Concurrency: Comp 402 web site from 2009 Brian Goetz, Java Concurrency in Practice (available onlne at this website)available onlne at this website Coping with Multicore Emerging parallel extensions of Java/Scala that guarantee determinism (in designated subset) and do not require explicit synchronization and avoid JMM issues Habanero Java Habanero ScaHabanero Scala Success of non-deterministic unit test is not very meaningful
Problems Due to Java Memory Model JMM is MUCH weaker than sequential consistency Writes to shared data may be held pending indefinitely unless target is declared volatile or is shielded by the same lock as subsequent reads. Why not always use locking (synchronized)? Significant overhead Increases likelihood of deadlock Extremely difficult to reason about program execution for specific inputs because so many schedules are allowed. A model that accommodates compiler writers rather than software developers.
Hidden Pitfalls in Using JUnit to Test Concurrent Java Junit Is Completely Broken for Concurrent Code Units: Fails to detect exceptions and failed assertions in threads other than the main thread (!) Fails to detect if auxiliary thread is still running when main thread terminates; all execution is aborted when main thread terminates. Fails to ensure that all auxiliary threads were joined by main thread before termination. (In Habanero Java, all programs are implicity enclosed a comprehensive join called finish () but not in Java.)
Possible Solutions to Concurrent Testing Problems Programming Language Features Ensure that bad things cannot happen; perhaps ensure determinism (reducing testing to sequential semantics!) May restrict programmers Comprehensive Testing Testing if bad things happen in any schedule All schedules may be too stringent for programs involving GUIs Does not limit space of solutions but testing burden is greatly increased. Good testing tools are essential.
Coping with the Java Memory Model Avoid using synchronized and minimize the size of synchronized blocks to reduce likelihood of deadlock. Identify all classes that can be shared and make all fields in such classes either final or volatile. Ensures sequential consistency (almost). Array elements are still technically a problem because they cannot be marked as volatile. The ConcurrentUtilities library includes a special form of array with volatile elements.
Improvements to Junit Uncaught exceptions and failed assertions –Not caught in child threads ConcJUnit developed by my former graduate student Mathias Ricken fixes all of the problems with Junit. Developed for Java 6; Java 7 not yet supported. Mathias developed some other tools to help test concurrent programs but none of them have yet reached production quality (e.g., random delays/yields). Research idea: JVM from Hell. Presumably easy to use ConcJUnit jar instead of Junit in Eclipse. Designed for drop-in compatibility with Junit 4.7.
Sample JUnit Tests public class Test extends TestCase { public void testException() { throw new RuntimeException("booh!"); } public void testAssertion() { assertEquals(0, 1); } } if (0!=1) throw new AssertionFailedError(); } Both tests fail.
Problematic JUnit Tests public class Test extends TestCase { public void testException() { new Thread(new Runnable() { public void run() { throw new RuntimeException("booh!"); } }).start(); } new Thread(new Runnable() { public void run() { public void run() { throw new RuntimeException("booh!"); throw new RuntimeException("booh!"); }}).start(); throw new RuntimeException("booh!"); Main thread Child thread Main thread Child thread spawns uncaught! end of test success!
Problematic JUnit Tests public class Test extends TestCase { public void testException() { new Thread(new Runnable() { public void run() { throw new RuntimeException("booh!"); } }).start(); } new Thread(new Runnable() { public void run() { public void run() { throw new RuntimeException("booh!"); throw new RuntimeException("booh!"); }}).start(); throw new RuntimeException("booh!"); Main thread Child thread Main thread Child thread spawns uncaught! end of test success!
Problematic JUnit Tests public class Test extends TestCase { public void testException() { new Thread(new Runnable() { public void run() { throw new RuntimeException("booh!"); } }).start(); } new Thread(new Runnable() { public void run() { public void run() { throw new RuntimeException("booh!"); throw new RuntimeException("booh!"); }}).start(); throw new RuntimeException("booh!"); Main thread Child thread Uncaught exception, test should fail but does not!
Problematic JUnit Tests public class Test extends TestCase { public void testFailure() { new Thread(new Runnable() { public void run() { fail("This thread fails!"); } }).start(); } new Thread(new Runnable() { public void run() { public void run() { throw new RuntimeException("booh!"); throw new RuntimeException("booh!"); }}).start(); throw new RuntimeException("booh!"); Main thread Child thread Uncaught exception, test should fail but does not!
Thread Group for JUnit Tests public class Test extends TestCase { public void testException() { new Thread(new Runnable() { public void run() { throw new RuntimeException("booh!"); } }).start(); } new Thread(new Runnable() { public void run() { public void run() { throw new RuntimeException("booh!"); throw new RuntimeException("booh!"); }}).start(); throw new RuntimeException("booh!"); Test thread Child thread invokes checks TestGroup’s Uncaught Exception Handler
Thread Group for JUnit Tests public class Test extends TestCase { public void testException() { new Thread(new Runnable() { public void run() { throw new RuntimeException("booh!"); } }).start(); } new Thread(new Runnable() { public void run() { public void run() { throw new RuntimeException("booh!"); throw new RuntimeException("booh!"); }}).start(); throw new RuntimeException("booh!"); Test thread Child thread Test thread Child thread spawns uncaught! end of test failure! invokes group’s handler Main thread spawns and waitsresumes check group’s handler
Improvements to JUnit Uncaught exceptions and failed assertions –Not caught in child threads Thread group with exception handler –JUnit test runs in a separate thread, not main thread –Child threads are created in same thread group –When test ends, check if handler was invoked Detection of uncaught exceptions and failed assertions in child threads that occurred before test’s end Past tense: occurred!
Child Thread Outlives Parent public class Test extends TestCase { public void testException() { new Thread(new Runnable() { public void run() { throw new RuntimeException("booh!"); } }).start(); } new Thread(new Runnable() { public void run() { public void run() { throw new RuntimeException("booh!"); throw new RuntimeException("booh!"); }}).start(); throw new RuntimeException("booh!"); Test thread Child thread Test thread Child thread spawns uncaught! end of test failure! invokes group’s handler Main thread spawns and waitsresumes check group’s handler
Child Thread Outlives Parent public class Test extends TestCase { public void testException() { new Thread(new Runnable() { public void run() { throw new RuntimeException("booh!"); } }).start(); } new Thread(new Runnable() { public void run() { public void run() { throw new RuntimeException("booh!"); throw new RuntimeException("booh!"); }}).start(); throw new RuntimeException("booh!"); Test thread Child thread Test thread Child thread spawns uncaught! end of test success! invokes group’s handler Main thread spawns and waitsresumes check group’s handler Too late!
Enforced Join public class Test extends TestCase { public void testException() { new Thread(new Runnable() { public void run() { throw new RuntimeException("booh!"); } }); t.start(); … t.join(); } Thread t = new Thread(new Runnable() { public void run() { public void run() { throw new RuntimeException("booh!"); throw new RuntimeException("booh!"); }}); t.start(); … t.join(); … throw new RuntimeException("booh!"); Test thread Child thread
Testing Using ConcJUnit Replacement for junit.jar or as plugin JAR for JUnit 4.7 compatible with Java 6 (not 7 or 8) Available as binary and source at Results from DrJava’s unit tests Child thread for communication with slave VM still alive in test Several reader and writer threads still alive in low level test (calls to join() missing) DrJava currently does not use ConcJUnit Tests based on a custom-made class extending junit.framework.TestCase Does not check if join() calls are missing
Conclusion Improved JUnit now detects problems in other threads –Only in chosen schedule –Needs schedule-based execution Annotations ease documentation and checking of concurrency invariants –Open-source library of Java API invariants Support programs for schedule-based execution
Future Work Adversary scheduling using delays/yields (JVM from Hell) Schedule-Based Execution (Impractical?) Replay stored schedules Generate representative schedules Dynamic race detection (what races bugs?) Randomized schedules (JVM from Hell) Support annotations from Floyd-Hoare logic Declare and check contracts (preconditions & postconditions for methods) Declare and check class invariants
Extra Slides
Test all possible schedules –Concurrent unit tests meaningful again Number of schedules (N) –t: # of threads, s: # of slices per thread detail Tractability of Comprehensive Testing
Extra: Number of Schedules back Product of s-combinations For thread 1: choose s out of ts time slices For thread 2: choose s out of ts-s time slices … For thread t-1: choose s out of 2s time slices For thread t-1: choose s out of s time slices Writing s-combinations using factorialWriting s-combinations using factorial Cancel out terms in denominator and next numeratorCancel out terms in denominator and next numerator Left with (ts)! in numerator and t numerators with s!Left with (ts)! in numerator and t numerators with s!
If program is race-free, we do not have to simulate all thread switches –Threads interfere only at “critical points”: lock operations, shared or volatile variables, etc. –Code between critical points cannot affect outcome –Simulate all possible arrangements of blocks delimited by critical points Run dynamic race detection in parallel –Lockset algorithm (e.g. Eraser by Savage et al) Tractability of Comprehensive Testing
Critical Points Example Thread 1 Thread 2 Local Var 1 Shared Var Lock lock access unlock All accesses protected by lock Local variables don’t need locking All accesses protected by lock
Fewer critical points than thread switches –Reduces number of schedules –Example:Two threads, but no communication N = 1 Unit tests are small –Reduces number of schedules Hopefully comprehensive simulation is tractable –If not, heuristics are still better than nothing Fewer Schedules
Limitations Improvements only check chosen schedule –A different schedule may still fail –Requires comprehensive testing to be meaningful May still miss uncaught exceptions –Specify absolute parent thread group, not relative –Cannot detect uncaught exceptions in a program’s uncaught exception handler (JLS limitation) details
Extra: Limitations May still miss uncaught exceptions –Specify absolute parent thread group, not relative (rare) Koders.com: 913 matches ThreadGroup vs. 49,329 matches for Thread –Cannot detect uncaught exceptions in a program’s uncaught exception handler (JLS limitation) Koders.com: 32 method definitions for uncaughtException method back
Extra: DrJava Statistics %1071 Unit tests passedfailed not run Invariantsmetfailed % failed KLOC “event thread” %12999 back