(Quickly) Testing the Tester via Path Coverage Alex Groce Oregon State University (formerly NASA/JPL Laboratory for Reliable Software)

Slides:



Advertisements
Similar presentations
Object Oriented Analysis And Design-IT0207 iiI Semester
Advertisements

Test process essentials Riitta Viitamäki,
Regression Methodology Einat Ravid. Regression Testing - Definition  The selective retesting of a hardware system that has been modified to ensure that.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
© 2012 Whamcloud, Inc. Lustre Automation Challenges John Spray Whamcloud, Inc. 0.4.
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
Who’s Watching the Watchmen? The Time has Come to Objectively Measure the Quality of Your Verification by David Brownell Design Verification Analog Devices.
1 Basic Definitions: Testing What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults.
CSE 331 SOFTWARE DESIGN & IMPLEMENTATION TESTING II Autumn 2011.
A 100,000 Ways to Fa Al Geist Computer Science and Mathematics Division Oak Ridge National Laboratory July 9, 2002 Fast-OS Workshop Advanced Scientific.
1 Today More on SPIN and verification (using instrumentation to handle tracking problems and check properties) Using SPIN for both model checking and “pure”
G Robert Grimm New York University Pulling Back: How to Go about Your Own System Project?
Finding the Weakest Characterization of Erroneous Inputs Dzintars Avots and Benjamin Livshits.
G Robert Grimm New York University Pulling Back: How to Go about Your Own System Project?
Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.
1 Today More on random testing + symbolic constraint solving (“concolic” testing) Using summaries to explore fewer paths (SMART) While preserving level.
EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,
Testing - an Overview September 10, What is it, Why do it? Testing is a set of activities aimed at validating that an attribute or capability.
Source Code Management Or Configuration Management: How I learned to Stop Worrying and Hate My Co-workers Less.
Static Code Analysis and Governance Effectively Using Source Code Scanners.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
1 Welcome to CS 362 Applied Software Engineering Dr. Alex Groce (KEC 3067) Testing, debugging, running programs Design for testability Implementation (actual.
CSC 395 – Software Engineering Lecture 34: Post-delivery Maintenance -or- What’s Worse than Being a Code Monkey?
Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.
Software Testing. Definition To test a program is to try to make it fail.
@benday #vslive Automated Build, Test & Deploy with TFS, ASP.NET, and SQL Server Benjamin
1 Debugging and Testing Overview Defensive Programming The goal is to prevent failures Debugging The goal is to find cause of failures and fix it Testing.
What is Software Testing? And Why is it So Hard J. Whittaker paper (IEEE Software – Jan/Feb 2000) Summarized by F. Tsui.
University of Maryland Bug Driven Bug Finding Chadd Williams.
1 Welcome to CS 362 Applied Software Engineering What happens after (and during) design? Testing, debugging, maintaining programs Lessons for software.
1 Lecture 19 Configuration Management Software Engineering.
Software Testing Testing principles. Testing Testing involves operation of a system or application under controlled conditions & evaluating the results.
Coverage – “Systematic” Testing Chapter 20. Dividing the input space for failure search Testing requires selecting inputs to try on the program, but how.
Computer Programming I An Introduction to the art and science of programming with C++
Version Control.
The First in GPON Verification Classic Mistakes Verification Leadership Seminar Racheli Ganot FlexLight Networks.
(A radical interpretation) Tomo Lennox Bow Tie computer services Why Agile Works.
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
CSE403 Software Engineering Autumn 2001 More Testing Gary Kimura Lecture #10 October 22, 2001.
Unit Testing 101 Black Box v. White Box. Definition of V&V Verification - is the product correct Validation - is it the correct product.
CSC 395 – Software Engineering Lecture 10: Execution-based Testing –or– We can make it better than it was. Better...faster...agiler.
Software Construction Lecture 18 Software Testing.
Unit Testing Maintaining Quality. How do you test? Testing to date…
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
1 Legacy Code From Feathers, Ch 2 Steve Chenoweth, RHIT Right – Your basic Legacy, from Subaru, starting at $ 20,295, 24 city, 32 highway.
What is Testing? Testing is the process of finding errors in the system implementation. –The intent of testing is to find problems with the system.
1 Theme 2: Thinking Like a Tester, Continued. 2 Thinking Like a Tester Lesson 20: “Testing requires inference, not just comparison of output to expected.
Test-Driven Development Eduard Miric ă. The problem.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
Fixing the Defect CEN4072 – Software Testing. From Defect to Failure How a defect becomes a failure: 1. The programmer creates a defect 2. The defect.
Chapter 1: Fundamental of Testing Systems Testing & Evaluation (MNN1063)
Software testing techniques Software testing techniques Software Testability Presentation on the seminar Kaunas University of Technology.
CS 5150 Software Engineering Lecture 2 Software Processes 1.
Optimization Problems
Version Control and SVN ECE 297. Why Do We Need Version Control?
Dynamic Testing.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
Random Test Generation of Unit Tests: Randoop Experience
Week 6 MondayTuesdayWednesdayThursdayFriday Testing III Reading due Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.
Week # 4 Quality Assurance Software Quality Engineering 1.
Split your database Store temporary tables in a backend Don't use memo fields Create temporary tables to speed up queries Don't put Mac and Windows users.
CS 5150 Software Engineering Lecture 21 Reliability 2.
Ideas and Challenges on testing a routing protocol
Debugging Intermittent Issues
Testing More In CS430.
C++ coding standard suggestion… Separate reasoning from action, in every block. Hi, this talk is to suggest a rule (or guideline) to simplify C++ code.
Here is a puzzle I found on a t-shirt
CSCE 315 – Programming Studio, Fall 2017 Tanzir Ahmed
Test Cases, Test Suites and Test Case management systems
Lab 8: GUI testing Software Testing LTAT
Presentation transcript:

(Quickly) Testing the Tester via Path Coverage Alex Groce Oregon State University (formerly NASA/JPL Laboratory for Reliable Software)

A Sad Software Story A Very Important Space Mission A Critical Module: Multiplier FOR MARS Test Engineer

A Sad Software Story A Very Important Space Mission A Critical Module: Multiplier FOR MARS Test Engineer “If this fails, we could lose the mission!” Automated testing!

A Sad Software Story Multiplier FOR MARS Test Engineer Complex automated test framework

A Sad Software Story Multiplier FOR MARS Test Engineer Complex automated test framework 6 months… 8,976,423,124 tests… Improvements… Bug fixes… Tester changes… 1,000,000,000 tests with NO failures!

A Sad Software Story Multiplier FOR MARS Test Engineer Launch! Mission Day 9 6 x 9 = 42… 42???

A Sad Software Story “We found three very subtle bugs. Manual testing would never have found them. We assumed it would find all the important bugs.” “The automated tests had very high branch coverage.” “We ran the tester for six days in a row, and found no bugs.” Congressional hearings

Automated Software Testing Powerful, effective, important, but… Relies on a large code base, may be nearly as complex as the module to be tested! Behavior too complex to really understand Configuration management can be a nightmare Invites complacency about testing, neglect of manual tests When a bug is introduced into the tester, the result may be lots of passing tests Very hard to know when something is wrong Congressional hearings: conclusions

The Problem Very hard to know when something is wrong How do we know when an automated tester is producing false negatives (no failed tests) due to a bug in the tester? – Bug may mean a coding error, configuration foul up, or a fundamentally bogus assumption

The Problem Automated testers are highly complex software systems with behavior that is – Particularly hard to specify (“find all the bugs” is not a nice clean LTL property or assertion) – Pretty much impossible for humans to understand (how do you summarize 100,000,000 tests?) – Easy to get wrong – Potentially mission or safety critical

Possible Solutions? Traditional Regression Testing Differential Testing (“bakeoff”) Coverage Measures

Traditional Regression Testing Run latest tester on old (known buggy) versions of the SUT Good: – Good for detecting regressions of the tester – Easy to understand results (“Yesterday, my tester caught this bug; today, it does not”)

Traditional Regression Testing Bad: – Changes to interface of SUT require lots of work – Very coarse, very slow – need full run to compare – Old bugs may be easier to find As software becomes more mature, remaining bugs are (almost by definition) lower probability

Differential Testing A variation: compare to a different tester on current software version Problems: – Where do we get another effective automated tester? These things are hard to write! – If it’s better, why not just use that one? Why bother with the copper tester when we have a gold standard available?

Coverage Branch and statement coverage – Good, minimal checks: know why lines that aren’t covered aren’t covered – RED ALERT if a previously covered branch isn’t covered by latest version of the tester

Coverage Branch and/or statement coverage – Coarse: random testing and model checking perform similarly, even in cases where model checking is known better for fault detection – Slow: may take full test period to find a difference in branch coverage Full automated test runs often take a day or two When do we declare the coverage worse, given the all/nothing nature of covering branches?

Path Coverage Fine grained – Therefore often quick – Exposes differences between test approaches that aren’t detected with branch coverage

Another Software Story File system modules for JPL’s Mars Science Laboratory mission Automated testing system based on explicit-state model checking [VMCAI 08, WODA 08, CFV 08, ASE 08] Weeks of “no bugs” testing – Developer of file system happened to stumble across some bugs while testing new functionality “How did we miss this stuff???”

Path Coverage Instrument with CIL – Track path bitvector, function entry if (x == 3) { x++; if (y > 0) { y++; } } else { x--; } becomes if (x == 3) { add_to_bv(pathBV, 1); x++; if (y > 0) { add_to_bv(pathBV, 1); y++; } else { add_to_bv(pathBV, 0); } } else { add_to_bv(pathBV, 0); x--; }

Path Coverage Coverage here is per entry function, not whole program paths – Our application is a file system – Testing of a library: therefore we care about top- level function entry paths, not whole test-case – Takes less storage, still guarantees unique path Overhead is acceptable (~15%) because does not change model checking storage time, which dominates test runtime

Traditional Regression Testing Ten minutes of testing (x 6 processors)

Swarm Model Checking Standard Depth First Search on a very large model gets lost somewhere in a branch of a branch of a very big tree Heuristics? But we have no idea – Where the bugs are – The structure of the state space So, generate a vast array of different search configurations, transitions orderings – And let parallelism (multicore desktops) have at it! Most effective method we know for testing programs with very large state spaces

Test Focus Worse overall path coverage doesn’t always mean the tester is buggy – Can get better coverage of some functions if we don’t cover other functions at all – But we don’t want to cover only some functions… Bugs may only arise when both are called – Or build 500 different configurations… – Automatic generation of a diverse set of focuses Swarm for test focus

Is Path Coverage the Solution? Not really It’s helpful, and it finds some problems Branch/path coverage measures should be seen as basic due diligence for critical systems testing But testing the tester is still very difficult

Questions? Suggestions? How do you test your automated testers?