A Framework for Testing Database Applications Joint work with Phyllis G. Frankl (Polytechnic) Saikat Dan (Polytechnic) Filippos Vokolos (Lucent Technologies) Elaine J. Weyuker (AT&T Labs - Research) David Chays Polytechnic University Brooklyn, NY
Motivation Database systems play an important role in virtually every modern organization Faults can be very costly Programmers/testers may lack experience and/or time Little attention has been paid to DB application program correctness
Outline of Talk Background Aspects of DB system correctness Issues in testing DB application programs Architecture of tool set Tool for generating database states Additional issues and approaches
DBMS and DB application DB application, eg., /* C program with embedded SQL*/ Database Management System DB DB schema, eg., Emp(ssn, name, addr, sal) Dept(id, dept-name)
Relational databases Data is viewed as a collection of relations –relation schema –relation (relation state) Table S ssnname Johnson Smith Jones Blake Tables, tuples, attributes, constraints for example, create table S (ssn char(11) primary key, name char(25) not null)
Aspects of Correctness Does the DBMS perform all operations correctly? Is concurrent access handled correctly? Is the system fault-tolerant?... Does the application program behave as intended?
Traditional vs. DB programs function imperative nature function declarative nature input output input DB state output DB state
Customer-feature table: –customerID –address –features –... Billing table –customerID –billing plan –... Input customer ID and name of feature to which the customer wishes to subscribe. Invalid ID: return 0 feature unavailable in that area: return code 2 feature available but incompatible with existing features: return code 3 else update customer’s feature record, update billing table, return code 1 Example of an Informal Specification
What are the Input/Output Spaces? Naïve approach –I = {customer-IDs} X {feature-names} –0 = {0,1,2,3} More suitable approach: –I = {customer-IDs} X {feature-names} X {database-states} –0 = {0,1,2,3} X {database-states} Problem: –must control and observe the DB state
DB Application Testing Goal Select “interesting” DB states along with user inputs that exercise “interesting” behavior Cover wide variety of situations that could arise in practice Do so in a way that facilitates checking of output to user and resulting DB state
Situations to Explore Customer already subscribes to that feature Feature not available in customer’s area Feature available, but incompatible with other features customer already has Feature available and compatible with existing features Customer doesn’t yet subscribe to any features...
May involve interplay between several tables Table 1: incompatible features Table 2: features available in various areas Table 3: customers and features feature incompatible_feature F1 F2... feature area F F ID area F1 F2... FN
Will Live Data Suffice? May not reflect sufficiently wide variety of situations May be difficult to find the situations of interest May violate privacy or security constraints
Generating Synthetic Data DB state is a collection of relation states, each of which is a subset of the Cartesian product of some domains Generating domain elements and gluing them together isn’t enough, since constraints must be honored We attempt to generate interesting data that obey integrity constraints Use schema and user supplied info
Suggestions from tester DB schema App source App exec User input Output DB state Results Input Generator State Generator State Checker Output Checker
DB state generator Inputs DB schema (in SQL) Parses schema to derive info about –attributes –tables –constraints : uniqueness, not-NULL, referential integrity –inputs additional info from user –suggested attribute values, divided into groups, similar to Category-Partition Testing [Ostrand- Balcer] –additional annotations
create table s (sno char(5), sname char(20), status decimal(3), city char(15), primary key(sno)); create table p (pno char(6) primary key, pname char(20), color char(6), weight decimal(3), city char(15)); create table sp (sno char(5), pno char(6), qty decimal(5), primary key(sno,pno), foreign key(sno) references s, foreign key(pno) references p); Example Schema
Create table s( sno char(5), primary key(sno) ); Create table s( sno char(5) primary key ); Column Definition Nodetag type = T_ColumnDef colname = “sno” type name = “bpchar” Constraints = NIL Table Constraint Nodetag type = T_Constraint contype = CONSTR_PRIMARY keys T_IDENT name = “sno” Stmt Create Stmt Nodetag type = T_CreateStmt relname = “s” Column Definition Nodetag type = T_ColumnDef colname = “sno” type name = “bpchar” Constraints contype = CONSTR_PRIMARY Stmt Create Stmt Nodetag type = T_CreateStmt relname = “s”
P | 5 | pname | F| F| F| F| F| F| F| pno | F| F| F| F| F| F| F| weight| F| F| F| F| F| F| F| color | F| F| F| F| F| F| F| city | P | char | ~pr | ~un | ~nn pname | P | char | ~pr | ~un | ~nn pno | P | char | pr | un | ~nn weight | P | dec | ~pr | ~un | ~nn color | P | char | ~pr | ~un | ~nn cp S | 4 | globalTablePointer sname | F| F| F| F| F| F| F| sno | F| F| F| F| F| F| F| City | F| F| F| F| F| F| F| status | F| F| F| F| F| F| F| sname | S | char | ~pr | ~un | ~nn sno | F| F| F| F| F| F| F| City | F| F| F| F| F| F| F| status | F| F| F| F| F| F| F| sno | S | char | pr | un | ~nn city | S | char | ~pr | ~un | ~nn status | S | dec | ~pr | ~un | ~nn cp SP | 3 | Null pno |SP | char | pr | un | ~nn | foreign sno |SP | char | pr | un | ~nn | foreign qty |SP | dec | ~pr | ~un | ~nn cp
Selecting Attribute Values Initial prototype queries tester for suggested values and guidance on how to use those values Values may be partitioned into data groups (choices) Tester may specify probabilities for data groups
--choice_name: low choice_name: medium choice_name: high
Each category (column) can have a list of choices pointed to by cp. cp lowhighmedium
DB table generation Tester specifies table sizes Tool generates tuples for insertion –select data group or NULL, guided by annotations –select value from data group, obeying constraints –keep track of values used Outputs sequence of SQL insert statements
sno: --choice_name: sno S1 S2 S3 S4 S5 sname: --choice_name: sname Smith Jones Blake Clark Adams pname: --choice_name: interior seats airbags dashboard choice_name: exterior doors wheels bumper city: --choice_name: domestic --choice_prob: 90 Brooklyn Florham-Park Middletown choice_name: foreign --choice_prob: 10 London Bombay pno: --choice_name: pno P1 P2 P3 P4 P5 status: --choice_name: status --null_prob: color: --choice_name: color blue green yellow weight: --choice_name: weight Input files for Parts-Supplier database
city: --choice_name: domestic --choice_prob: 90 Brooklyn Florham-Park Middletown choice_name: foreign --choice_prob: 10 London Bombay status: --choice_name: status --null_prob:
A database state produced by the tool snopnoqty S1P15000 S1P2300 S1P310 S2P16000 S2P2400 S2P35000 S3P120 S3P2300 S3P330 S4P16000 pnopnamecolorweightcity P1NULLblue100Brooklyn P2Seatsgreen300Florham-Park P3airbagsyellow500Middletown snosnamestatuscity S1NULL0Brooklyn S2Smith1Florham-Park S3JonesNULLLondon S4BlakeNULLMiddletown Table sTable sp Table p
Related work Lyons-77, DB-Fill, TestBase Like our approach, rely on user to supply attribute values Do not handle integrity constraints as completely Require tester to describe tables in special- purpose language (rather than SQL)
Testing Techniques in DB literature Focus on DB system performance, rather than DB application correctness Benchmarks Performance of SQL processor –Generation of large number of DML statements [Slutz] Generation of huge tables with given statistical properties [Grey et al ]
Summary Issues Framework Prototype
Future Work Refinement based on feedback from DB application developers / testers Other DB state generation heuristics –boundary values –“missing” constraints –difficult SQL features Interplay between DB state and user inputs Checking DB state after test execution Checking application outputs