Comparison of Unit-Level Automated Test Generation Tools Shuang Wang Co-authored with Jeff Offutt April 4,
Motivation Automated test data generation tools ▫Reduce time and effort ▫Easier to maintain ▫Encapsulate knowledge of how to design and implement high quality tests We have more software, but insufficient resources We need to be more efficient Frameworks like JUnit provide empty boxes ▫Hard question: what do we put in there?
Three open source tools JCrasher TestGen4j JUB What are our criteria? Free Unit-level Automated test generation Java What’s available out there? 3 Two commercial tools AgitarOne JTest
Experiment Goals and Design Compare three unit level automatic test data generators Evaluate them based on their mutation scores ▫Subjects Three free automated testing tools ▫- JCrasher, TestGen4j, and JUB ▫Control groups ▫- Edge Coverage and Random Test ▫Metric Mutant score results 4
Experiment Design muJava P Mutants JCrasher TestGen4J JUB Test Set JC Test Set TG Test Set JUB JCrasher Mutation Score Manual Random Test Set Ram Test Set EC TestGen4J Mutation Score JUB Mutation Score Random Mutation Score Edge Cover Mutation Score Manual Edge Cover 5
Experiment Design muJava P Mutants JCrasher TestGen4J JUB Test Set JC Test Set TG Test Set JUB JCrasher Mutation Score Manual Random Test Set Ram Test Set EC TestGen4J Mutation Score JUB Mutation Score Random Mutation Score Edge Cover Mutation Score Manual Edge Cover 6
Java Programs Used ProgramsMutants NameLOCMethodsTraditionalClass BoundedStack Inventory Node Recipe Twelve VendingMachine Queue TrashAndTakeOut Total
Experiment Design muJava P Mutants JCrasher TestGen4J JUB Test Set JC Test Set TG Test Set JUB JCrasher Mutation Score Manual Random Test Set Ram Test Set EC TestGen4J Mutation Score JUB Mutation Score Random Mutation Score Edge Cover Mutation Score Manual Edge Cover 8
Subjects ( Automatic Test Data Generators ) Control groups Edge Coverage ▫one of the weakest and most basic test criterion Random Test ▫the “weakest effort” testing strategy 9 JCrasher try to "crash" the program TestGen exercise boundary value testing JUB use values like 0s and nulls
Experiment Design muJava P Mutants JCrasher TestGen4J JUB Test Set JC Test Set TG Test Set JUB Jcrasher Mutation Score Manual Random Test Set Ram Test Set EC TestGen4J Mutation Score JUB Mutation Score Random Mutation Score Edge Cover Mutation Score Manual Edge Cover 10
muJava Create mutants Run tests 11
Results & findings 12 Tool% Killed TraditionalClassTotal JCrasher 42%46%42% TestGen 28%37%29% JUB 24%26%24% EC 66%67%66% Random 36%34%36% Total % Killed
Results & findings 13 Tool Efficiency #Tests# KilledKilled / Tests JCrasher TestGen JUB EC Random Efficiency
Results & findings 14 TraditionalMutantsJCrasherTestGenJUBEdge CoverageRandom AORB5632%21%30%66%29% AORS1146%27%36%55%27% AOIU6646%32%17%79%36% AOIS43828%24%22%53%21% AODU1100% ROR25661%25%17%79%57% COR1233%25% 58%33% COD633% 17%50%33% COI475% 50%75% LOI12653%48%44%80%48% Total97642%28%24%66%36%
Example 15 For vendingMachine, except for edge coverage, the other four mutation scores are below 10% MuJava creates dozens of mutants on these predicates, and the mostly random values created by the three generators have a small chance of killing those mutants
Example Scores for BoundedStack were the second lowest for all the test sets except edge coverage 16 only two of the eleven methods have parameters. The three testing generators depend largely on the method signature, so fewer parameters may mean weaker tests
Example JCrasher got the highest mutation score among the three generators 17 JCrasher uses invalid values to attempt to “crash” the class
Conclusion These three tools by themselves generate tests that are very poor at detecting faults Among public-accessible tools, criteria-based testing is hardly used We need better Automated Test Generation Tools 18
Contact Shuang Wang Computer Science Department George Mason University 18