Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay

 Complex SQL queries hard to get right  Question: How to check if an SQL query is correct?  Formal verification is not applicable since we do not have a separate specification and an implementation  State of the art solution: Generate test databases and check if the query gives the intended result 2

 Automated Test Data generation  Based on database constraints, and SQL query ▪ Agenda [Chays et al., STVR04], a tool which generates test cases for database applications which additionally uses user fed heuristics  Ensuring query result is not empty ▪ Reverse Query Processing [Binning et al., ICDE07] takes desired query output and generates relation instances ▪ Handle a subset of Select/Project/Join/GroupBy queries  None of the above guarantee anything about detecting errors in SQL queries  Question: How do you model SQL errors?  Answer: Query Mutation 3

 Mutant: Variation of the given query  Mutations model common programming errors, like ▪ Join used instead of outerjoin (or vice versa) ▪ Join/selection condition errors ▪ < vs. <=, missing or extra condition ▪ Wrong aggregate (min vs. max)  Mutant may be the intended query 4

 Traditional use of mutation testing has been to check coverage of dataset  Generate mutants of the original program by modifying the program in a controlled manner  A dataset kills a mutant if query and the mutant give different results on the dataset  A dataset is considered complete if it can kill all non-equivalent mutants of the given query  Prior work:  Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05] defined a class of SQL query mutations  Shortcoming: do not address test data generation  Our goal: generated dataset for testing query  Test dataset and query result on the dataset are shown to human, who verifies that the query result is what is expected given this dataset  Note that we do not need to actually generate and execute mutants 5

 Address the problem of test data generation for killing non-equivalent mutants  Equivalent Mutants: r(A,B) s(B,C) and r(A,B) s(B,C) where r.B is a foreign key to s, and is not null will always produce the same resultset  Define class of:  Join/outerjoin mutations  Selection predicate mutations  Algorithm for test data generation that kills all non- equivalent mutants in above class  Under some simplifying assumptions (given in the paper)  With the guarantee that generated datasets are small and realistic, to aid in human verification of results 6

 Join type mutations: An occurrence of a join operator (,,, ) is replaced by one of the other join operators  Defining join mutations in SQL is complicated by the absence of a particular join order  SELECT * FROM a,b,c WHERE (a.x = b.x) and (b.x = c.x)  We consider all relational algebra expressions (trees) equivalent (under inner join reordering) to the given SQL query  We consider join type mutations to single join nodes in each tree above 7

 Case I: Mutation at root node, with no foreign key constraints  Schema: r(A), s(B)  To kill this mutant: ensure that for an r tuple there is no matching s tuple  Generated test case: r(A)={(1)}; s(B)={}  Basic idea: (a) run query on given database, (b) from result extract matching tuples for r and s (c) delete s tuple to ensure no matching tuple for r 8

 Case II: Extra join above mutated node  Schema: r(A,B), s(C,D), t(E)  To kill this mutant we must ensure that for an r tuple there is no matching s tuple, but there is a matching t tuple  Generated test case: r(A,B)={(1,2)}; s(C,D)={}; t(E)={(2)} 9

 Given join expression on relations r 1, r 2, …, r n  Create dataset where all relations have a set of matching tuples  For each relation r i, generate a dataset where rest of relations match, but r i is empty ▪ Unless making r i empty makes join graph disconnected  Above procedure kills all join type mutations of given inner join tree  Outer joins complicate picture when attributes are projected out ▪ May have to make more than one r i empty at a time  Foreign keys may prevent making some r i empty 10

 Case III: Mutation at root node with foreign key constraints and selection on right side  Schema: r(A), s(B,C)  Foreign key: r.A → s.B  To kill this mutant we must create an s tuple which matches with the r tuple on the foreign key reference, but which has s.C ≠ 4  Generated test case: r(A)={(2)}; s(B,C)={(2,5)}  Notion of valid nullable pattern defined in paper specifies which relations can be made null/non-matching, given foreign key constraints and join graph 11

 Implemented using Java and PostgreSQL  Creates datasets by extracting and modifying tuples from given database  Currently handles join type mutation and selection predicate mutation  For creating a merged dataset ▪ Tuples having same values for join attributes must be blocked from being inserted again  Handling selection predicate mutation ▪ Eg. to distinguish r.A < 3 and r.A <= 3 we generate tuples with r.A = 2 and 3 12

 Ongoing work :  Synthetic data generation taking database and query constraints into account which is non trivial ▪ Idea (from RQP [Binning et al ICDE07]): Use a model checker to generate data ▪ Under implementation using CVC3  Extend the technique to handle aggregations and sub-queries  Future work: data generation for application code with multiple queries 13

Questions

 Problem: is Q equivalent to a mutant Q‘ can be reduced to query containment and vice versa in polynomial time  The Chase algorithm can be used to generate datasets to show that Q and Q' are not equivalent (for SPJ queries and several extensions)  such a dataset would kill the mutant Q‘  limited work on outerjoin containment data generation  However we don't want to enumerate each mutant and generate separate datasets  too expensive 15

 Under the following conditions we can generate merged datasets:  Tuples having same values for join attributes must be blocked from being inserted again  The query must not contain any equality selection on an unique key  The result of the query must contain one or more attributes which together form an unique key for any relation  Also attributes from the result forming an unique key must be guaranteed to be non-null in the result 16

 Consider the three relations :  Student(name, deptcode, progcode),  Department(deptcode, deptname)  Program(progcode, progname)  And a query:  SELECT rollno, name, deptname, progname FROM student s INNER JOIN department d ON s.deptcode=d.deptcode INNER JOIN program p ON s.progcode=p.progcode 17

 Generate mutants by mutating join operator of a single node for all above trees Query Tree 1 Query Tree 2Query Tree 3 18

Generated data shows :  A student (Devang) with valid program and department  A student (Abhijeet) with invalid department  A student (Sandeep) with invalid program  A student (Aditya) with invalid program invalid program and department  A program (PhD) with no student  A department (Mechanical) with no student DeptcodeDeptname CSComputer CHChemical MEMechanical ProgcodeProgname 0B.Tech 1M.Tech 2PhD RollnoNameprogcodedeptcode 501Devang1CS 401Abhijeet0CE 701Sandeep5CH 101Aditya4MA DepartmentProgramStudent 19

Generated data shows :  A student (Devang) with valid program and department  A program (B.Tech) with no student  A department (Electrical) with no student DeptcodeDeptname CSComputer EEElectrical ProgcodeProgname 0B.Tech 1M.Tech RollnoNameprogcodedeptcode 501Devang1CS DepartmentProgramStudent Foreign Keys are: Student.progcode → Program.progcode Student.deptcode → Department.deptcode 20

Case of no foreign keys 21

Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Similar presentations

Presentation on theme: "Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Similar presentations

Presentation on theme: "Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay."— Presentation transcript:

Similar presentations

About project

Feedback