Mutation Analysis with Coverage Discounting Peter Lisherness, Nicole Lesperance, and Kwang-Ting (Tim) Cheng University of California – Santa Barbara.

Slides:



Advertisements
Similar presentations
Determining Test Quality through Dynamic Runtime Monitoring of SystemVerilog Assertions Kelly D. Larson
Advertisements

Applications of Synchronization Coverage A.Bron,E.Farchi, Y.Magid,Y.Nir,S.Ur Tehila Mayzels 1.
High-Level Fault Grading. Improving Gate-Level Fault Coverage by RTL Fault Grading* * W. Mao and R. K. Gulati, ITC 1996, pp
- Verifying “Golden” reused IPs The Evil’s in the Edits William C Wallace Texas Instruments Nitin Jayaram Texas Instruments Nitin Mhaske Atrenta Inc Vijay.
Xiushan Feng* ASIC Verification Nvidia Corporation Automatic Verification of Dependency 1 TM Jayanta Bhadra
CS527: Advanced Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 18, 2008.
Regression Methodology Einat Ravid. Regression Testing - Definition  The selective retesting of a hardware system that has been modified to ensure that.
An Analysis and Survey of the Development of Mutation Testing by Yue Jia and Mark Harmon A Quick Summary For SWE6673.
Xiushan Feng* ASIC Verification Nvidia Corporation Assertion-Based Design Partition 1 TM Jayanta Bhadra, Ross Patterson.
How to Accelerate the Analog Design Verification Flow Itai Yarom Senior Verification Expert Synopsys.
MS-SoC Best Practices – Advanced Modeling & Verification Techniques for first-pass success By Neyaz Khan Greg Glennon Dan Romaine.
Who’s Watching the Watchmen? The Time has Come to Objectively Measure the Quality of Your Verification by David Brownell Design Verification Analog Devices.
Coverage Discounting: Improved Testbench Qualification by Combining Mutation Analysis with Functional Coverage Nicole Lesperance, Peter Lisherness, and.
Planning under Uncertainty
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.
1 Today Another approach to “coverage” Cover “everything” – within a well-defined, feasible limit Bounded Exhaustive Testing.
RIT Software Engineering
SE 450 Software Processes & Product Metrics 1 Defect Removal.
Chris Wilson and David L. Dill Computer Systems Laboratory Stanford University June, 2000 Reliable Verification Using Symbolic Simulation with Scalar Values.
EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,
Statistical Critical Path Selection for Timing Validation Kai Yang, Kwang-Ting Cheng, and Li-C Wang Department of Electrical and Computer Engineering University.
Effective Questioning in the classroom
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
AMOST Experimental Comparison of Code-Based and Model-Based Test Prioritization Bogdan Korel Computer Science Department Illinois Institute of Technology.
Software Testing and Validation SWE 434
March 8, 2006Spectral RTL ATPG1 High-Level Spectral ATPG for Gate-level Circuits Nitin Yogi and Vishwani D. Agrawal Auburn University Department of ECE.
Class Specification Implementation Graph By: Njume Njinimbam Chi-Chang Sun.
 1  Outline  stages and topics in simulation  generation of random variates.
Beyond Call Recording: Speech Improves Quality Assurance Larry Mark Chief Technology Officer SER Solutions, Inc.
Some Course Info Jean-Michel Chabloz. Main idea This is a course on writing efficient testbenches Very lab-centric course: –You are supposed to learn.
Test Suite Reduction for Regression Testing of Simple Interactions between Two Software Modules Dmitry Kichigin.
Reporter: PCLee. Although assertions are a great tool for aiding debugging in the design and implementation verification stages, their use.
The First in GPON Verification Classic Mistakes Verification Leadership Seminar Racheli Ganot FlexLight Networks.
Robust Low Power VLSI ECE 7502 S2015 Evaluation of Coverage-Driven Random Verification ECE 7502 – Project Presentation Qing Qin 04/23/2015.
1 Hybrid-Formal Coverage Convergence Dan Benua Synopsys Verification Group January 18, 2010.
Summarizing “Structural” Testing Now that we have learned to create test cases through both: – a) Functional (blackbox)and – b) Structural (whitebox) testing.
CS3505: DATA LINK LAYER. data link layer  phys. layer subject to errors; not reliable; and only moves information as bits, which alone are not meaningful.
1 Program Planning and Design Important stages before actual program is written.
Prioritizing Test Cases for Regression Testing Article By: Rothermel, et al. Presentation by: Martin, Otto, and Prashanth.
“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.
Mutation Testing G. Rothermel. Fault-Based Testing White-box and black-box testing techniques use coverage of code or requirements as a “proxy” for designing.
TOPIC : Introduction to Fault Simulation
Software Testing Part II March, Fault-based Testing Methodology (white-box) 2 Mutation Testing.
Properties Incompleteness Evaluation by Functional Verification IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO. 4, APRIL
Building usable software through early testing. Objective Show the value of testing from the beginning of the development cycle. Consensus in the industry.
Detecting Hardware Trojans in Unspecified Functionality Using Mutation Testing Nicole Fern K.-T. Tim Cheng UC Santa Barbara 1.
Hongjie Zhu,Chao Zhang,Jianhua Lu Designing of Fountain Codes with Short Code-Length International Workshop on Signal Design and Its Applications in Communications,
Improving Fault Tolerance in AODV Matthew J. Miller Jungmin So.
FMCAD A Theory of Mutations with Applications to Vacuity, Coverage, and Fault Tolerance Orna Kupferman 1 Wenchao Li 2 Sanjit A. Seshia 2 1 Hebrew.
Mutation Testing Breaking the application to test it.
Random Test Generation of Unit Tests: Randoop Experience
MUTACINIS TESTAVIMAS Benediktas Knispelis, IFM-2/2 Mutation testing.
Mutation Testing Laraib Zahid & Mariam Arshad. What is Mutation Testing?  Fault-based Testing: directed towards “typical” faults that could occur in.
Benefits of a Virtual SIL
Part I Project Initiation.
Software Testing.
Using Execution Feedback in Test Case Generation
Mutation testing Julius Purvinis IFM-0/2.
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTES
Ask the Mutants: Mutating Faulty Programs for Fault Localization
Teaching The Art of Verification
Software Testing (Lecture 11-a)
August Shi, Tifany Yung, Alex Gyori, and Darko Marinov
Test Case Purification for Improving Fault Localization
Another Example Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be
Aiman H. El-Maleh Sadiq M. Sait Syed Z. Shazli
Uncovered failures in MSS
Mitigating the Effects of Flaky Tests on Mutation Testing
Mutation Testing Faults are introduced into the program by creating many versions of the program called mutants. Each mutant contains a single fault. Test.
Presentation transcript:

Mutation Analysis with Coverage Discounting Peter Lisherness, Nicole Lesperance, and Kwang-Ting (Tim) Cheng University of California – Santa Barbara

Motivation 1 Functional Coverage Mutation Analysis Coverage Discounting + Functionally Meaningful + Evaluates design activation -Ignores propagation and checker quality + Evaluates propagation and checker quality - Mutants hard to analyze + Functionally Meaningful + Evaluates design activation, propagation and checkers

Coverage Discounting: The Main Idea Use fault insertion to discount (revise downward) coverage scores 2 Detected + undetected mutants Functional Coverage Mutation Analysis Coverage Discounting Undetected Mutants Coverage Score 100% 0% Discounted Coverage Score

An Existing Solution: Mutation Analysis DUT Tests Checker Undetected Fix Add/Fix Detected Benefits – Evaluates quality of testbench checkers – Indicates if tests fail to propagate potential errors to the checker or fail to activate mutants 3

Mutation Analysis Drawbacks 1.Long runtime for simulation 2.Many man-hours required to analyze results: 2.1 Mutants are synthetic - difficult to identify testbench improvements to target mutant detection 2.2Redundant mutants never detectable: Some undetected faults are redundant if (a > 0) b=1+a; else b=1-a; if (a >= 0) b=1+a; else b=1-a; Example: 4

Premise of Coverage Discounting 1) A mutant may change design functionality 2) A coverpoint, covered by testbench in the original design, may be suppressed (i.e. no longer be covered) in the mutated design 3) If the mutant suppressing the coverpoint is not detected, then the coverpoint is no longer considered covered in the original design 5

Coverage Discounting Flow 6 DUT Tests Checker Detected Undetected Add more tests No Yes Fix Coverage Changed?

An Example Original Design if (remaining_length >= 5) begin burst_en=true; length_next:=remaining_length - 4 end else begin burst_en=false; length_next:=remaining_length - 1 end... BURST_MODE : coverpoint burst_en; 7 Number of packets sent in burst mode increased from 4 to 5 in the condition, but designers forgot to increase packet number everywhere else

An Example Mutated Design false if (false) begin burst_en=true; length_next:=remaining_length - 4 end else begin burst_en=false; length_next:=remaining_length - 1 end... BURST_MODE : coverpoint burst_en; *The mutant goes undetected because the checkers are only sensitive to bus content and ordering, not timing 8

An Example Since the coverpoint BURST_MODE is suppressed by the mutant and the mutant is undetected, BURST_MODE is discounted It is now clear that the checker must check for timing in order to adequately cover burst mode functionality If timing is checked, the real bug is caught 9

Experimental Results 10 Experimental Results

OpenRISC CPU Experiment #1: Measure test quality – Simulation Infrastructure Functional simulator (ISS) Ad-hoc fault insertion – Tests: Instruction-Set v. Random – Coverpoints: Opcodes Experiment #2: Identify checker weaknesses – Simulation Infrastructure: SoC full-chip RTL simulation Certitude fault insertion – Tests: SoC Regression Suite – Coverpoints: CPU top-level signals 11

Measure Test Quality Functional Tests Original Discounted Random Tests 12

Identify Checker Weaknesses Tests 13

Identify Checker Weaknesses 14

UART Illustrates success of coverage discounting applied to a sophisticated testbench Simulation Infrastructure: – Design RTL w/OVM Testbench – SystemVerilog coverpoints (ModelSim) – Certitude fault insertion Tests: OVM Sequences Coverpoints: Hand-written spec-based 15

UART Results Reduces debug effort from analysis of 146 mutants to examining 3 functional coverpoints 1588 Mutants: – 7 not activated – 106 not propagated – 33 not detected – Total 146 mutants demand attention 846 Coverpoints: – 4 uncovered – 3 discounted – All discounted relate to specific unchecked functions Loopback, timeout interrupt identification register 16

Results Examined Coverage discounting: – Identified "good" tests (propagation) – Diagnosed checker problems – Identified coverpoints vacuously covered However: – Runtime of mutation analysis still an issue – Technique sensitive to the quantity and quality of faults 17

Confidence Metric Aims to answer: 1.Is a coverpoint adequately challenged by the set of mutants? 2.When have enough faults been inserted (when can simulation be stopped)? 3.What is the optimal simulation ordering of mutants for coverage discounting? 18

Detection Confidence (DECO) Score Discounting relies on coverpoints being suppressed by faults Point confidence: the number of times a coverpoint is suppressed DECO(n): computes the percentage of coverpoints with point confidence greater than n 19

DECO-Directed Ordering Optimize test/mutant simulation order to increase DECO score 1.Test Selection: Choose test covering the most low confidence points 2.Mutant Selection: Select mutant activated by the fewest tests 20

Quicker Confidence CPUUART 21

Quicker Discounting CPUUART 22

Summary – DECO Provides feedback earlier Improves confidence (both over time and at termination) Enables early termination 23

Summary Analyzed information gained from coverage discounting for two designs Developed a confidence metric to gauge mutant effectiveness and an ordering heuristic to reduce runtime 24