Multi-RQP Generating Test Databases for the Functional Testing of OLTP Applications Carsten Binnig Joint work with: Donald Kossmann, Eric Lo DBTest Workshop, SIGMOD 2008, Vancouver
2 Motivation Todays testing techniques are not very efficient –20-70% of the costs of a SW project are spent for testing Costs caused by software errors in the US in 2000: ~$20-$60 bn Test automation is not a trivial problem –Writing test programs which verify the application behavior Maintainability / Quality of test programs is not easy Test automation for DB apps is even harder –Testing a certain behavior needs a particular database state Existing tools generate test databases independent from test cases How to generate relevant test databases for OLTP apps? (other work: [ICDE07], [SIGMOD07])
3 Example: Online-Library (1) Use Case: Reservation of a book 1.User enters the inventory number of the book 2.System shows the details of the book Exception 1: Book belongs to the closed stacks 3.User enter her account information 4.System checks account info and finishes reservation Exception 2: User account is disabled Exception 3: User account that has overdue fines
4 Example: Online-Library (2) Test Case (Expected Behavior) + Test Database Test Case 1 (Exception 1): –A book that belongs to the closed stack Test Case 2 (Exception 2): –A book that does not belong to the closed stacks + –A user account that is disabled Test Case 3 (Exception 3) : … Test Case 4 (Successful Reservation) : –A book that does not belong to the closed stacks + –A valid user account with no overdue fines
5 Outline Introduction State of the Art / Requirements (Multi-) Reverse Query Processing (RQP) Conclusions / Future Work
6 State of the Art General-purpose Database Generators => Random data over database schema (size of tables, data distributions) –Low test coverage => Data does not enable execution of all test cases –High maintenance costs => Manual adaptation of data necessary Script-based Database Generators => Application-specific data (e.g., a bunch of SQL INSERT statements) –High initial costs => Writing code to generate test database –Hard to extend => Hard to analyze side-effects Data Extractors => Extract data from existing applications –Test coverage strongly depends on existing data –High initial effort => Test data needs to be anonymized
7 Requirements for the Generation of Test Databases Specify a test database individually per test case –High test coverage is possible –Good extensibility for new test cases Allow a declarative specification of the test database –Maintainability of the specification is good (automatic evolution?) –Data generation can be optimized (runtime, amount of data) Specify only relevant the data for each test case –Initial costs to specify the test database are low –Changeability of specification is good Enable logical data independence of test data specification –Database schema can be changed without changing the specification of the test database
8 Outline Introduction State of the Art / Requirements (Multi-) Reverse Query Processing (RQP) Conclusions / Future Work
9 Reverse Query Processing (RQP) Problem Statement: –Given: SQL Query Q, Result R, Database Schema S –Output: Database D with Q(D)=R and D satisfies S Example (Test Case 1): There exists at least one book which does belong to the closed stacks Q: SELECT COUNT(*) AS cnt FROM book WHERE b_closed = 1 R: { =1 >} D: b_idb_titleb_closed… 1Title A1… 2Title B0… RQP S: CREATE TABLE book ( b_id INTEGER PRIMARY KEY, b_title VARCHAR (20), b_closed BOOLEAN NOT NULL,... ) Test Case 1 Application
10 RQP Basic Idea Query Processing: Input: Database D, Query Q Output: Result R Reverse Query Processing: Input: Query Q, Result R Output: Database D Database Query Processor Database Result Query Reverse Query Processor Result Query => RQP can generate many different databases
11 Query Processing (Simplified) χ COUNT(*) as cnt σ b_closed = 1 book Q: SELECT COUNT(*) FROM book WHERE b_closed = 1 Query Plan (Relational Algebra): SQL Query: b_idb_titleb_closed… 1Title A1… 2Title B0… cnt 1 b_idb_titleb_closed… 1Title A1… Query Result: D: R:
12 Reverse Query Processing (Query Compilation) χ -1 COUNT(*) as cnt σ -1 b_closed = 1 book Q: SELECT COUNT(*) AS cnt FROM book WHERE b_closed = 1 Reverse Query Plan (Reverse Relational Algebra): SQL Query: Data flow
13 Reverse Query Processing (Top-Down Data Generation) χ -1 COUNT(*) as cnt σ -1 b_closed = 1 book b_idb_titleb_closed… 1Title A1… cnt 1 b_idb_titleb_closed… 1Title A1… S: CREATE TABLE book ( b_id INTEGER PRIMARY KEY, b_title VARCHAR (20) NOT NULL, b_closed BOOLEAN NOT NULL,... ) R: { =1 >} Data flow Q: b_idb_titleb_closed… 1 2Title B0… D: R:
14 Multi Reverse Query Processing Problem: –One query + result are often not sufficient to specify a test database for more complex test cases –Multiple queries + result are necessary Example (Test Case 4): –Book that does not belong to the closed stacks (Q 1 und R 1 ) and –Valid user account without overdue fines (Q 2 und R 2 ) Idea: Restrict input query classes such that … –MRQP can be solved efficiently by using RQP –User can still specify any test database
15 RQP-disjoint Queries Idea: Q 1 /R 1 and Q 2 /R 2 specify disjoint data sets Example (Test Case 4): –Book that does not belong to the closed stacks (Q 1 and R 1 ) and –Valid user account without overdue fines (Q 2 and R 2 ) Q 1 : SELECT COUNT(*) AS cnt FROM book WHERE b_closed=0 R 1 : { } Q 2 : SELECT COUNT(*) AS cnt FROM user WHERE u_fines=0 R 2 : { } b_idb_titleb_closed… 1Title A0… D1:D1: u_idu_nameu_fines… 1User A0… D2:D2: Table book Table user
16 Query-Refinement Idea: Q 1 /R 1 specifies a subset of Q 2 /R 2 Example (Test Case for Use Case Book Search): –Ten books of author Grisham (Q 1 and R 1 ) –One of these books should belong to the closed stacks (Q 2 and R 2 ) Q 1 : SELECT COUNT(*) AS cnt FROM book WHERE b_author='Grisham' R 1 : { } Q 2 : SELECT COUNT(*) AS cnt FROM book WHERE b_author='Grisham' AND b_closed = 1 R 2 : { } b_idb_authorb_closed… 1Grisham1… D2:D2: Table book Q 1 : SELECT COUNT(*) AS cnt FROM book WHERE b_author='Grisham' AND b_closed <> 1 R 1 : { } b_idb_authorb_closed… 2Grisham0… ………… 10Grisham0… D1:D1: Table book
17 Outline Introduction State of the Art / Requirements (Multi-) Reverse Query Processing (RQP) Conclusions / Future Work
18 Conclusions / Future Work Problems of existing test database generators –Test databases are generated independent of test cases –Low test coverage, high maintainability costs, … (M)RQP to specify and generate test databases –Test data specification: declarative, minimal, … –Data generation based on database techniques (e.g., algebra operators, …) Open Research Problems –Evolution of test databases –Study usability of MRQP –…
19 ? snoitseuQQuestions ?