XML-based Testing of Web Software Services Joint research with Wuxhi Xu, Juan Luo and Suet Chun Lee Jeff Offutt Software Engineering George Mason University Fairfax, VA USA
Need for Reliable Web Apps Expedia sells more than $35 million in tickets weekly –Based on 2000 data In Dec 2006, amazon.com’s BOGO offer turned into a double discount Huge losses on web application failures –Financial services : $6.5 million per hour –Credit card sales applications : $2.4 million per hour –Media companies : $150,000 per hour Most faults introduced during maintenance Most security vulnerabilities are due to software faults STEP 2008© Jeff Offutt2
Research in Software Testing STEP 2008© Jeff Offutt3 Growth fueled by need for better software Research Laboratories MicrosoftGoogleSiemens… Conferences & Workshops ICSTISSRE STEP workshop MBT AST … University Groups FITGMUUNL… University / Industry Partnerships FedEx / U Memphis Useful ways to build better software
Travel Information Flow STEP 2008© Jeff Offutt4 dates airports checkin schedule seats reservation checkin rooms phones schedules Airline flights name flights Organizers Spouse Hotel Agent Airport Colleagues contact schedule room receipt meeting rooms confirm gate checkout Traveler
Travel Information Needed STEP 2008© Jeff Offutt5 dates airports checkin schedule seats reservation checkin rooms phones schedules Airline flights name flights Organizers Spouse Hotel Agent Airport Colleagues contact schedule room receipt meeting rooms confirm gate checkout Information Needed Conference schedule Hotel address & phone number Flight numbers and times Gate numbers Seat numbers Hotel confirmation number Hotel room number Meeting rooms and times Local contact information for colleagues … And after returning home … all this information is immediately discarded !
Current Method Most of us accumulate this information from – –Websites –Phone conversations –Personal conversations –Pieces of paper And then try to organize and track it –In our heads –Random scraps of paper –Laboriously hand-entering data into hand-held devices STEP 2008© Jeff Offutt6 This is very 20 th century analog …
21 st Century Method Data are sent to a hand-held device wirelessly –Hand-held device automatically organizes data into meaningful information Information is presented to traveler when needed STEP 2008© Jeff Offutt7 Data are generated and sent through software in a service oriented architecture
© Jeff Offutt8 Web Services A Web Service is a program that offers services over the Internet to other software programs –Internet-based –Uses SOAP and XML –Peer-to-peer communication via message passing Web service components can integrate dynamically, by finding other services during execution Web services transmit data formatted in XML STEP 2008 What does a service oriented architecture look like ?
© Jeff Offutt9 Web Service Architecture Web Services server Laptop PDA Work- station Cell phone Web Apps internet servers clients Client-server server clients STEP 2008
© Jeff Offutt10 Web Service Technologies WSDL Specification Specification Components Legacy Systems Wrapped Specification UDDI Registry Services Wrapped Applications SOAP / XML Points to URL SOAP / XML Publish Find Bind STEP 2008 messages transmitted in XML
© Jeff Offutt11 Why XML ? Software components that pass data must agree on format, types, and organization Web services have unique requirements : –Very loose coupling and dynamic integration 1970s style P1P2 File File storage Un-documented format Data saved in binary mode Source not available 1980s style P1P2 File WM File storage Un-documented format Data saved as plain text Access through wrapper module Data hard to validate STEP 2008
© Jeff Offutt12 XML Data is passed directly between components XML allows for self-documenting data 2000s style Schemas P1 P2 Parser XML File P3 P1, P2 and P3 can see the format, contents, and structure of the data Data sharing is independent of type Format is easy to understand Grammars are defined in schemas STEP 2008
© Jeff Offutt13 XML for Flight Example XML messages are defined by grammars (schemas) Schemas can define many kinds of types Schemas include “facets,” which refine the grammar USAir 2608 IAD CLT 10:50:00 12:11: STEP 2008 schemas define input spaces for software components
© Jeff Offutt14 Input Space Grammars The input space can be described in many ways –User manuals –Unix man pages –Method signature / Collection of method preconditions –A language Most input spaces can be described as grammars Grammars are usually not provided, but creating them is a valuable service by the tester –Errors will often be found simply by creating the grammar Input Space The set of allowable inputs to software STEP 2008
© Jeff Offutt15 Using Input Grammars Software should reject or handle invalid data Programs often do this incorrectly Some programs (rashly) assume all input data is correct Even if it works today … –What about after the program goes through some maintenance changes ? –What about if the component is reused in a new program ? Consequences can be severe … –The database can be corrupted –Users are not satisfied –Most security vulnerabilities are due to unhandled exceptions … from invalid data STEP 2008
© Jeff Offutt16 Validating Inputs Before starting to process inputs, wisely written programs check that the inputs are valid How should a program recognize invalid inputs ? What should a program do with invalid inputs ? If the input space is described as a grammar, a parser can check for validity automatically –This is very rare –It is easy to write input checkers – but also easy to make mistakes Input Validation Deciding if input values can be processed by the software STEP 2008
Representing Input Domains STEP 2008© Jeff Offutt17 goal Desired inputs (goal domain) specified Described inputs (specified domain) implemented Accepted inputs (implemented domain)
Representing Input Domains Goal domains are often irregular Goal domain for credit cards † –First digit is the Major Industry Identifier –First 6 digits and length specify the issuer –Final digit is a “check digit” –Other digits identify a specific account Common specified domain –First digit is in { 3, 4, 5, 6 } (travel and banking) –Length is between 13 and 16 Common implemented domain –All digits are numeric STEP 2008© Jeff Offutt18 † More details are on :
Representing Input Domains STEP 2008© Jeff Offutt19 goal goal domain specified specified domain implemented implemented domain This region is a rich source of software errors …
© Jeff Offutt20 Testing Web Services This form of testing allows us to focus on interactions among the components A formal model of the XML grammar is used The grammar is used to create valid as well as invalid tests The grammar is mutated The mutated grammar is used to generate new XML messages The XML messages are used as test cases STEP 2008
© Jeff Offutt21 XML Data Model Example STEP 2008 Built-in types
© Jeff Offutt22 XML Constraints – “Facets” Boundary Constraints Non-boundary Constraints maxOccursenumeration minOccursuse lengthfractionDigits maxExclusivepattern maxInclusivenillable maxLengthwhiteSpace minExclusiveunique minInclusive minLength totalDigits STEP 2008
© Jeff Offutt23 XML Data Model An XML schema can be modeled as a tree T = (N, D, X, E, n r ) N is a finite set of elements and attribute nodes D is a finite set of built-in and derived data types X is a finite set of constraints E is a finite set of edges Edges are from N to N D, plus a constraint n r is the root node STEP 2008
Generating Tests Valid tests –Generate tests as XML messages by deriving strings from grammar –Take every production at least once –Take choices … “maxOccurs = “unbounded” means use 0, 1 and more than 1 Invalid tests –Mutate the grammar in structured ways –Create XML messages that are “almost” valid –This explores the gray space on the previous slide STEP 2008© Jeff Offutt24
© Jeff Offutt25 Mutation Operators Every nonterminal symbol in a production is replaced by other nonterminal symbols. 1. Nonterminal Replacement Every terminal symbol in a production is replaced by other terminal symbols. 2. Terminal Replacement Every terminal and nonterminal symbol in a production is deleted. 3. Terminal and Nonterminal Deletion Every terminal and nonterminal symbol in a production is duplicated. 4. Terminal and Nonterminal Duplication STEP 2008 These operators are designed to mimic common XML errors
© Jeff Offutt26 Test Case Generation A test case is an XML message Tests are generated directly from mutated schemas Constraints are “violated” systematically –Values beyond the boundary values “maxLength=5” “abcdef” –Values outside the non-boundary constraints “fractionDigits=2” “ ” Multiple XML messages from the same schema Messages are invalid, so a valid response is an error –False positives : Messages that are accidentally valid STEP 2008
© Jeff Offutt27 Test Case Generation – Examples Original Schema (Partial) Mutants : value = “3” value = “1” Mutants : value = “100” value = “2000” XML from Original Schema Mutant XML Mutant XML Mutant XML Mutant XML STEP 2008
© Jeff Offutt28 Case Study 1 Small web service created at GMU –Three components : Mars robot, space station, ground control –Ground control is a three-tier web application Correct behavior is to have abnormal responses –Receiver cannot process the data, responds with a fault –Receiver has a runtime exception Three types of mutants –Deletion, Insertion, Constraint Violation DICVOriginalTotal XML Schemas XML Messages Abnormal Response Normal Response Only CV tests got normal responses STEP 2008
© Jeff Offutt29 Case Study 2 From Web Services Interoperability Organization –Supply chain management –Seven XML schemas –Three were requests and used for invalid tests Mutated Schemas SchemaDICV Retailer14839 Warehouse6223 Manufacturer8323 XML Messages DICV Fifteen faults inserted into the program Seven faults found – all by CV tests STEP 2008
© Jeff Offutt30 Analysis of Faults 8 faults not found 5 faults : Affected back-end log file –Observability … log file was not seen 1 fault : Depends on inputs from the database –Controllability … tests depend on XML, not DB 2 faults : Required specific values that were not used All Deletion and Insertion tests were detected by program STEP 2008
Discussion Deleting and inserting parts of the grammar have little or no value Observability and controllability are major problems with web services –This is well-documented with web applications The constraints are much more useful for generating tests than deleting and inserting XML elements STEP 2008© Jeff Offutt31
© Jeff Offutt32 Extensions Needed Improve invalid test generation –Focus on constraint-based tests from tests –Expand mutation of constraint Automatic test generation –Based on input space partitioning (category partitioning) General problems –Dealing with observability –Dealing with controllability STEP 2008
Travel Info Flow – Web Services STEP 2008© Jeff Offutt33 Organizers Airline Agent desired travel info web app web service travel info web services Traveler Use web services instead of to plan trip Airport wireless web service connection checkin gate info Hotel wireless web service connection checkin checkout room key Connect wirelessly to web services during journey initial schedule
Travel Info Flow – Web Services Traveler will send requirements to travel agent and hotel Information will be sent to traveler’s web service, which will store it on the traveler’s hand-held device Traveler will check-in at the airport by beaming data from hand-held –Airport will send gate information and a map to the hand-held Traveler will check in and out of hotel by beaming data from hand-held –Hotel will send hotel room, map, and electronic key to hand-held Conference organizers will send meeting organization details to room computer, which will sync with hand-held Room computer will send contact details to spouse STEP 2008© Jeff Offutt34
Trusting Web Services Reliable Secure Dependable Usable STEP 2008© Jeff Offutt35 For widespread adoption, users must be confident web services are
Conclusions This mode of operation will be more convenient and efficient –Reduces the need to laboriously hand-translate information from one device or format to another All of the technologies to support these interactions are available We are missing some engineering –Reliable web services –Secure data transmission –Usable interfaces STEP 2008© Jeff Offutt36 Testing addresses some, but not all these issues
© Jeff Offutt37 Contact Jeff Offutt STEP 2008