Are You Sure What Failures Your Tests Produce? Lee White
Results on Testing GUI Systems CIS (Complete Interaction Sequences) approach for testing GUI systems: Applied to four large commercial GUI systems Testing GUI system in different environments: operating system, CPU speed, memory Modified CIS approach applied to regression test two versions of a large commercial GUI system
Three Objectives for this Talk Use of memory tools during GUI testing discovered many more defects; observability problems here In GUI systems, defects manifested themselves as different failures (or not at all) in different environments In GUI systems, many more behaviors reside in the code than designer intended.
Complete Interaction Sequence (CIS) Identify all responsibilities (GUI activity that produces an observable effect on the surrounding user environment). CIS: Operations on a sequence of GUI objects that collectively implement a responsibility. Example: (assume file opened) File_Menu -> Print -> Print_Setup_Selection -> Confirm_Print
FSM for a CIS (Finite State Model) Design a FSM to model a CIS Requires experience to create FSM model To test for all effects in a GUI, all paths within the CIS must be executed Loops may be repeated, but not consecutively
Figure 1 Edit-Cut-Copy-Paste CIS FSM
How to Test a CIS? Design Tests: FSM model based upon the design of the CIS is used to generate tests. Implementation Tests: In the actual GUI, check all CIS object selections, and select all those transitions to another GUI object within the CIS; add these transitions to the FSM model to generate tests, as well as any new inputs or outputs to/from the CIS.
A B D C Figure 2 Design Tests for a Strongly Connected Component [(I1,B,C,D,A,B,C,O1), (I2,A,B,C,D,A,B,C,O1)]
Figure 3 Implementation Tests for a Strongly Connected Component [ (I1,B,C,D,B,C,D,A,B,C,D,A*,B,C,O1), (I1,B,C,D,B,C,D,A,B,C,D,A*,B,C,D,O2), (I2,A,B,C,D,B,C,D,A,B,C,D,A*,B,C,O1), (I2,A,B,C,D,B,C,D,A,B,C,D,A*,B,C,D,O2), (I3,D,A,B,C,D,B,C,D,A*,B,C,O1), (I3,D,A,B,C,D,B,C,D,A*,B,C,D,O2) ]
GUI System GUI Objects Design # Tests # Faults Impl. # Tests # Faults Real Networks Adobe PS Acrobat R Inter WinDVD Multi-Media DB Table 1 Case Study of 4 Systems
Memory Tools Memory tools monitor memory changes, CPU changes and register changes Used to detect failures that would have eluded detection, and account for 34% of faults found in these empirical studies Used two such tools: Memory Doctor and Win Gauge from Hurricane Systems Tool.
Table 2 Hidden Faults Detected by Memory Tools GUI System Hidden Faults All Faults Percent Real Network71937% Adobe PS Acrobat Rd41040% Inter WinDVD1333% Multi-Media DB2922% Total Faults144134%
Failures of GUI Tests on Different Platforms Lee White and Baowei Fei EECS Department Case Western Reserve University
Environment Effects Studied Environment Effects: Operating System, CPU Speed, Memory Changes Same software tested: RealOne Player 950 implementation tests For OS, same computer used, but use of Windows 98 and 2000 investigated
Regression Testing GUI Systems A Case Study to Show the Operations of the GUI Firewall for Regression Testing
GUI Features Feature: A set of closely related CISs with related responsibilities New Features: Features in a new version not in previous versions Totally Modified Features: Features that are so drastically changed in a new version that this change cannot be modeled by an incremental change; simple firewall cannot be used.
Software Under Test Two versions of Real Player (RP) and RealJukeBox (RJB): RP7/RJB1, RP8/RJB2 13 features; RP7: 208 obj, 67 CIS, 67 des. tests, 137 impl. tests; RJB1: 117 obj, 30 CIS, 31 des. tests, 79 impl. tests 16 features; RP8: 246 obj, 80 CIS, 92 des. tests, 176 impl. tests; RJB2: 182 obj, 66 CIS, 127 des. tests, 310 impl. tests.
RP7/RJB1 RP8/RJB2 8 Features 17 Faults 21 Faults 0 Faults 5 Totally Modified Features Firewall 3 New Features 53 Faults in Original System 59 Faults 16 Features Tested from Scratch by T2 Tested by T1 Figure 4 Distribution of Faults Obtained by Testers T1 and T2
Failures Identified in Version1, Version2 We could identify identical failures in Version1 and Version2. This resulted in 9 failures in Version2, and 7 failures in Version1 not matched. The challenge here was to show which pair of failures might be due to the same fault.
Different Failures in Versions V1, V2 for the Same Fault V1: View track in RJB, freezes if album cover included V2: View track in RJB, loses album cover Env. Problem: Graphical settings needed from V2 for testing V1
Different Failures (cont) V1: Add/Remove channels in RP does not work when RJB is also running V2: Add/Remove channels lose previous items Env. Problem: Personal browser used in V1, but V2 uses a special RJB browser
Different Failures (cont) V1: No failure present V2: In RP, Pressing forward crashes system before playing stream file Env. Problem: Forward button can only be pressed during play in V1, but in V2, Forward botton can be selected at any time; regression now finds this fault
Conclusions for Issue #1 The use of memory tools illustrated extensive observability problems in testing GUI systems: In testing four commercial GUI systems: 34% were missed without use of this tool. In regression testing, 85% & 90% missed. Implication: GUI testing can miss defects or surprises (or produce minor failures).
Conclusions for Issue #2 Defects manifested as different failures (or not at all) in different environments: Discussed in regression testing study Also observed in testing case studies, as well as for testing in different HW/SW environments.
Implication for Issue #2 When testing, you think you understand what failures will occur for certain tests & defects for the same software. But you don’t know what failures (if any) will be seen by the user in another environment.
Conclusions for Issue #3 Difference between design and implementation tests are due to non- design transitions in actual FSMs for each GUI CIS: Observed in both case studies Implication: Faults are commonly associated with these unknown FSM transitions, and are not due to the design.
Question for the Audience Are these same three effects valid to this extent for software other than just GUI systems? If so, then why haven’t we seen lots of reports and papers in the software literature reporting this fact?