Coverage-Directed Differential Testing of JVM Implementations

Slides:



Advertisements
Similar presentations
Grey Box testing Tor Stålhane. What is Grey Box testing Grey Box testing is testing done with limited knowledge of the internal of the system. Grey Box.
Advertisements

SE 450 Software Processes & Product Metrics Reliability: An Introduction.
1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.
Basic Data Mining Techniques
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Session-02. Objective In this session you will learn : What is Class Loader ? What is Byte Code Verifier? JIT & JAVA API Features of Java Java Environment.
Test coverage Tor Stålhane. What is test coverage Let c denote the unit type that is considered – e.g. requirements or statements. We then have C c =
© 2012 IBM Corporation Rational Insight | Back to Basis Series Chao Zhang Unit Testing.
VeriFlow: Verifying Network-Wide Invariants in Real Time
Overview of Software Testing 07/12/2013 WISTPC 2013 Peter Clarke.
A New Method For Developing IBIS-AMI Models
Min Chen, and Yuhong Yan Concordia University, Montreal, Canada Presentation at ICWS 2012 June 24-29, 2012, Hawaii (Honolulu), USA Redundant Service Removal.
Rev. 0 CONFIDENTIAL Mod.19 02/00 Rev.2 Mobile Terminals S.p.A. Trieste Author: M.Fragiacomo, D.Protti, M.Torelli 31 Project Idea Feasibility.
Question of the Day  On a game show you’re given the choice of three doors: Behind one door is a car; behind the others, goats. After you pick a door,
Coverage Criteria for Testing of Object Interactions in Sequence Diagrams Atanas (Nasko) Rountev Scott Kagan Jason Sawin Ohio State University.
Functional Verification Figure 1.1 p 6 Detection of errors in the design Before fab for design errors, after fab for physical errors.
Grey Box testing Tor Stålhane. What is Grey Box testing Grey Box testing is testing done with limited knowledge of the internal of the system. Grey Box.
Test Coverage CS-300 Fall 2005 Supreeth Venkataraman.
Programming with Microsoft Visual Basic th Edition
Chapter 10 Verification and Validation of Simulation Models
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
1 Exposing Behavioral Differences in Cross-Language API Mapping Relations Hao Zhong Suresh Thummalapenta Tao Xie Institute of Software, CAS, China IBM.
Computer Science 1 Systematic Structural Testing of Firewall Policies JeeHyun Hwang 1, Tao Xie 1, Fei Chen 2, and Alex Liu 2 North Carolina State University.
Software Quality Assurance and Testing Fazal Rehman Shamil.
Random Test Generation of Unit Tests: Randoop Experience
8 Nov 1.Backtracking 2.Structure clashes & program decomposition.
Business Monitoring Framework for Process Discovery with Real-Life Logs Mari Abe Michiharu Kudo IBM Research - Tokyo.
Test Case Purification for Improving Fault Localization presented by Taehoon Kwak SoftWare Testing & Verification Group Jifeng Xuan, Martin Monperrus [FSE’14]
Topics  Direct Predicate Characterization as an evaluation method.  Implementation and Testing of the Approach.  Conclusions and Future Work.
Software Testing. Software Quality Assurance Overarching term Time consuming (40% to 90% of dev effort) Includes –Verification: Building the product right,
Experience Report: System Log Analysis for Anomaly Detection
Applying Combinatorial Testing to Data Mining Algorithms
Chapter 10 NP-Complete Problems.
Software Testing.
The Impact of Concurrent Coverage Metrics on Testing Effectiveness
Chapter 4: Business Process and Functional Modeling, continued
Testing Tutorial 7.
Generating Automated Tests from Behavior Models
Introduction to Advanced Java Programming
ISP Coverage Criteria CS 4501 / 6501 Software Testing
WSRec: A Collaborative Filtering Based Web Service Recommender System
Input Space Partition Testing CS 4501 / 6501 Software Testing
High Performance Computing on an IBM Cell Processor --- Bioinformatics
*Acknowledgements: Dawn Song, Kostya Serebryany,
Syntax-based Testing CS 4501 / 6501 Software Testing
Software Development Cycle
Chapter 10 Verification and Validation of Simulation Models
*Acknowledgements: Suman Jana, Dawn Song, Kostya Serebryany,
Test Case Purification for Improving Fault Localization
Taxonomy of Test Oracles
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights Feng Zhang †⋄, Jidong Zhai ⋄, Xipeng Shen #, Onur Mutlu ⋆, Wenguang.
URBAN STREAM REHABILITATION
Collecting and Interpreting Quantitative Data – Introduction (Part 1)
ISP Coverage Criteria CS 4501 / 6501 Software Testing
Test coverage Tor Stålhane.
Overview of Query Evaluation
Institute of Computing Tech.
Software Development Cycle
MOEA Testing and Analysis
STEM PROJECT Engineering Design
Software Development Cycle
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Creating Additional Input Items
By Hyunsook Do, Sebastian Elbaum, Gregg Rothermel
FOT: A Versatile, Configurable, Extensible Fuzzing Framework
(presentor: jee-weon Jung)
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Mitigating the Effects of Flaky Tests on Mutation Testing
Chapter 11 Sets, and Selection
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Coverage-Directed Differential Testing of JVM Implementations Yuting Chen1, Ting Su2, Chengnian Sun3, Zhendong Su3, and Jianjun Zhao1,4 1Shanghai Jiao Tong University 2East China Normal University 3University of California, Davis 4Kyushu University

Outline Motivation Design Evaluation Related Work and Conclusion Testing of JVMs Test redundancy Goal + Key Observations Design Evaluation Related Work and Conclusion

Background: JVM *.class *.jar JVM Execution results

JVM Testing Testing a JVM using a number of test classfiles class0 … How is a JVM defect exposed? How is a test classfile achieved?

Challenge 1: How to expose a JVM defect? no test oracles Solution: differential JVM testing

An Example of JVM Behavior Discrepancy public abstract {}; HotSpot takes it as a ordinary method J9 reports a format error Cause: the JVM specification says that “other methods named <clinit> in a class file are of no consequence. They are not class or interface initialization methods. ” A class method needs to be more strictly defined

Challenge 2: How to obtain test classfiles? Option 1: using the real-world classfiles Option 2: domain-aware fuzz testing Some classes can reveal compatibility issues

An Example of JVM Behavior Discrepancy public abstract mymethod{}; clinit More JVM discrepancies are revealed by domain-aware fuzz testing

Challenge 2: How to obtain test classfiles? Option 1: using the real-world classfiles Option 2: domain-aware fuzz testing Some classes can reveal compatibility issues 

Challenge 2: How to obtain test classfiles? Solution 2: domain-aware fuzz testing An infinite number of test classfiles can be created They may reveal a small number of JVM discrepancies Test redundancy

Key Observation (1) legal or illegal legal X A classfile can encompass intricate constraints Corner cases can be created through rewriting seeds legal or illegal legal X Syntax tree Revised syntax tree

Key Observation (2) Equivalence class partition (ECP) saves the testing cost ECP works only if we can decide whether two tests belong to the same partition

Our Design: An Overview HotSpot for Java7 HotSpot for Java8 HotSpot for Java9 IBM’s J9 GIJ

Our Design: An Overview mutation testing selection It makes the classfile mutants more representative HotSpot for Java7 HotSpot for Java8 HotSpot for Java9 IBM’s J9 GIJ

Outline Motivation Design Evaluation Related Work and Conclusion Mutating classfiles Selecting representative classfile mutants Selectively applying mutators Differentially testing JVMs Evaluation Related Work and Conclusion

We have designed 129 mutators for mutating classfiles mutation testing selection Mutating Classfiles We have designed 129 mutators for mutating classfiles

Mutating Classfiles (2) mutation testing selection Mutating Classfiles (2) 123 mutators are designed for rewriting the ASTs of the seeds Six mutators are designed for rewriting the Jimple files of the seeds r0:=parameter0: java.lang.String[]; r1:=<java.lang.System: java.io.PrintStream out>; virtualinvoke $r1.<java.io.PrintStream:void println(java.lang.String)>("Executed"); … X “private” Limitation: Only the JVMs’ startup processes can be tested The mutated program constructs/attributes may be less likely to be activated during execution class class’

JVM Startup Errors and exceptions Errors and exceptions Creation and loading Linking Initialization Execution Errors and exceptions Errors and exceptions Errors and exceptions

JVM Startup (2) Errors and exceptions Creation and loading Linking Initialization Execution Errors and exceptions Errors and exceptions Can this class be normally executed, or at which stage some errors or exceptions can be thrown out? Errors and exceptions

Outline Motivation Design Evaluation Related Work and Conclusion Mutating classfiles Selecting representative classfile mutants Selectively applying mutators Differentially testing JVMs Evaluation Related Work and Conclusion

Selecting Representative Classfiles mutation testing selection Selecting Representative Classfiles ECP Do two classfiles belong to the same class? Yes, if the reference JVM equally processes them No, otherwise HotSpot for Java9

Selecting Representative Classfiles (2) mutation testing selection Selecting Representative Classfiles (2) Do two classfiles belong to the same class? Seeds 7000 smts, 2300 branches HotSpot for Java9 7020 stmts, 2340 branches New mutant ? stmts, ? branches Is it representative? Several comparison criteria can be given here

Outline Motivation Design Evaluation Related Work and Conclusion Mutating classfiles Selecting representative classfile mutants Selectively applying mutators Differentially testing JVMs Evaluation Related Work and Conclusion

Selecting Mutators Goal: to create as many representative classfiles as possible Fact: mutators are designed arbitrarily; some are effective, while some others are useless A naïve solution: to select mutators by learning from prior knowledge

An MCMC Sampling Method Which mutator will be selected at each step? A desired distribution: geometric distribution A has higher success rate than B, and thus it should be easier to be selected than B The actual distribution A B Proposition: The more number of representative classfiles have been created by a mutator, the more likely the mutator should be selected for further mutations

More Details The desired distribution Classfuzz picks up mutators at random, and then accepts or rejects the mutators by a Metropolis choice

Outline Motivation Design Evaluation Related Work and Conclusion Mutating classfiles Selecting representative classfile mutants Selectively applying mutators Differentially testing JVMs Evaluation Related Work and Conclusion

Execution Comparison result0 = jvm0 (env0, c, input) A JVM discrepancy appears when resulti≠resultj It can either be a JVM defect or a compatibility issue

Execution Comparison (2) result0 = jvm0 (env0, c, input) result1 = jvm1 (env0, c, input) result2 = jvm2 (env0, c, input) result3 = jvm3 (env0, c, input) result4 = jvm4 (env0, c, input) A JVM defect appears when resulti≠resultj

Execution Comparison (3) result0 = jvm0 (env0, c, input) result1 = jvm1 (env0, c, input) result2 = jvm2 (env0, c, input) result3 = jvm3 (env0, c, input) result4 = jvm4 (env0, c, input) A JVM defect appears when resulti≠resultj

Execution Comparison (3) result0 = jvm0 (env0, c, input) result1 = jvm1 (env0, c, input) result2 = jvm2 (env0, c, input) result3 = jvm3 (env0, c, input) result4 = jvm4 (env0, c, input) A JVM defect appears when resulti≠resultj jvm0 may miss catching some format errors 1

Outline Motivation Design Evaluation Related Work and Conclusion Setup Results Related Work and Conclusion

At each run the coverage can be conveniently collected Setup Coverage collection HotSpot for Java9 GCOV + LCOV Seeds 1216 classfiles in JRE 7 At each run the coverage can be conveniently collected  HotSpot (260K LOCs) Cost for cov. analysis: 30+ mins share/vm/classfile/ (11977 LOCs) Cost for cov. analysis: 90 secs

Evaluated Methods Classfuzz supplemented with a uniqueness criterion [st], [stbr], [tr] – explained in the paper Randfuzz, Greedyfuzz, Uniquefuzz classfuzz randfuzz greedyfuzz uniquefuzz Mutation-based  Cov. analysis  Uniqueness criterion [st] [stbr] [tr] Mutator selection

Metrics RQ1: How many test classfiles can be generated? #Iterations, |GenClasses|, |TestClasses| RQ2: How effective are the test classfiles? |Discrepancies|, |Distinct Discrepancies|, diff rate RQ3: Can the test classfiles find any JVM defects?

Outline Motivation Design Evaluation Related Work and Conclusion Setup Results Related Work and Conclusion

Results on Classfile Generation X Randfuzz generates 20 times as many classfiles as those generated by any other algorithm Classfuzz[stbr] generates the most number of representative classfiles Classfuzz[stbr] achieves the highest success rate among all the coverage-directed algorithms

Results on Classfile Generation (2) Classfuzz can utilize the prior knowledge to select mutators strongly correlated succ rate classfuzz randfuzz weakly correlated

Results on Differential JVM Testing Classfuzz can enhance the ratio of discrepancy triggering classfiles from 1.7% to 11.9% JVMs are compatible for most of the classfiles, but differ in processing corner cases We have experienced 898 different execution paths. 107 paths were related to JVM behavior differences

Discrepancy Analysis (1) public abstract {}; HotSpot takes it as a ordinary method J9 reports a format error The JVM specification needs to be clarified

Discrepancy Analysis (2) A type casting needs to be performed JVMs take their own classfile verification and type checking polices

Discrepancy Analysis (3) JVMs are not compatible to access some classes

Discrepancy Analysis (4) More findings J9 is less strict than HotSpot because J9 only verifies a method when it is invoked, while HotSpot verifies all methods before execution GIJ can execute an interface having a main method GIJ accepts a class with duplicate fields … These discrepancies can be found in a package of 62 discrepancy-triggering classes

Outline Motivation Design Evaluation Conclusion

Conclusion Problem Proposal: classfuzz: coverage-directed fuzz testing Testing JVMs requires painstaking effort in designing test classfiles along with their test oracles Proposal: classfuzz: coverage-directed fuzz testing Test classfile generation Mutating classes and selectively applying mutators Deciding the representativeness of a classfile mutant Differential JVM testing Tool is available at http://stap.sjtu.edu.cn/˜chenyt/DTJVM/index.htm