Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Similar presentations


Presentation on theme: "Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay."— Presentation transcript:

1 Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay

2  Complex SQL queries hard to get right  Question: How to check if an SQL query is correct?  Formal verification is not applicable since we do not have a separate specification and an implementation  State of the art solution: Generate test databases and check if the query gives the intended result 2

3  Automated Test Data generation  Based on database constraints, and SQL query ▪ Agenda [Chays et al., STVR04], a tool which generates test cases for database applications which additionally uses user fed heuristics  Ensuring query result is not empty ▪ Reverse Query Processing [Binning et al., ICDE07] takes desired query output and generates relation instances ▪ Handle a subset of Select/Project/Join/GroupBy queries  None of the above guarantee anything about detecting errors in SQL queries  Question: How do you model SQL errors?  Answer: Query Mutation 3

4  Mutant: Variation of the given query  Mutations model common programming errors, like ▪ Join used instead of outerjoin (or vice versa) ▪ Join/selection condition errors ▪ < vs. <=, missing or extra condition ▪ Wrong aggregate (min vs. max)  Mutant may be the intended query 4

5  Traditional use of mutation testing has been to check coverage of dataset  Generate mutants of the original program by modifying the program in a controlled manner  A dataset kills a mutant if query and the mutant give different results on the dataset  A dataset is considered complete if it can kill all non-equivalent mutants of the given query  Prior work:  Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05] defined a class of SQL query mutations  Shortcoming: do not address test data generation  Our goal: generated dataset for testing query  Test dataset and query result on the dataset are shown to human, who verifies that the query result is what is expected given this dataset  Note that we do not need to actually generate and execute mutants 5

6  Address the problem of test data generation for killing non-equivalent mutants  Equivalent Mutants: r(A,B) s(B,C) and r(A,B) s(B,C) where r.B is a foreign key to s, and is not null will always produce the same resultset  Define class of:  Join/outerjoin mutations  Selection predicate mutations  Algorithm for test data generation that kills all non- equivalent mutants in above class  Under some simplifying assumptions (given in the paper)  With the guarantee that generated datasets are small and realistic, to aid in human verification of results 6

7  Join type mutations: An occurrence of a join operator (,,, ) is replaced by one of the other join operators  Defining join mutations in SQL is complicated by the absence of a particular join order  SELECT * FROM a,b,c WHERE (a.x = b.x) and (b.x = c.x)  We consider all relational algebra expressions (trees) equivalent (under inner join reordering) to the given SQL query  We consider join type mutations to single join nodes in each tree above 7

8  Case I: Mutation at root node, with no foreign key constraints  Schema: r(A), s(B)  To kill this mutant: ensure that for an r tuple there is no matching s tuple  Generated test case: r(A)={(1)}; s(B)={}  Basic idea: (a) run query on given database, (b) from result extract matching tuples for r and s (c) delete s tuple to ensure no matching tuple for r 8

9  Case II: Extra join above mutated node  Schema: r(A,B), s(C,D), t(E)  To kill this mutant we must ensure that for an r tuple there is no matching s tuple, but there is a matching t tuple  Generated test case: r(A,B)={(1,2)}; s(C,D)={}; t(E)={(2)} 9

10  Given join expression on relations r 1, r 2, …, r n  Create dataset where all relations have a set of matching tuples  For each relation r i, generate a dataset where rest of relations match, but r i is empty ▪ Unless making r i empty makes join graph disconnected  Above procedure kills all join type mutations of given inner join tree  Outer joins complicate picture when attributes are projected out ▪ May have to make more than one r i empty at a time  Foreign keys may prevent making some r i empty 10

11  Case III: Mutation at root node with foreign key constraints and selection on right side  Schema: r(A), s(B,C)  Foreign key: r.A → s.B  To kill this mutant we must create an s tuple which matches with the r tuple on the foreign key reference, but which has s.C ≠ 4  Generated test case: r(A)={(2)}; s(B,C)={(2,5)}  Notion of valid nullable pattern defined in paper specifies which relations can be made null/non-matching, given foreign key constraints and join graph 11

12  Implemented using Java and PostgreSQL  Creates datasets by extracting and modifying tuples from given database  Currently handles join type mutation and selection predicate mutation  For creating a merged dataset ▪ Tuples having same values for join attributes must be blocked from being inserted again  Handling selection predicate mutation ▪ Eg. to distinguish r.A < 3 and r.A <= 3 we generate tuples with r.A = 2 and 3 12

13  Ongoing work :  Synthetic data generation taking database and query constraints into account which is non trivial ▪ Idea (from RQP [Binning et al ICDE07]): Use a model checker to generate data ▪ Under implementation using CVC3  Extend the technique to handle aggregations and sub-queries  Future work: data generation for application code with multiple queries 13

14 Questions

15  Problem: is Q equivalent to a mutant Q‘ can be reduced to query containment and vice versa in polynomial time  The Chase algorithm can be used to generate datasets to show that Q and Q' are not equivalent (for SPJ queries and several extensions)  such a dataset would kill the mutant Q‘  limited work on outerjoin containment data generation  However we don't want to enumerate each mutant and generate separate datasets  too expensive 15

16  Under the following conditions we can generate merged datasets:  Tuples having same values for join attributes must be blocked from being inserted again  The query must not contain any equality selection on an unique key  The result of the query must contain one or more attributes which together form an unique key for any relation  Also attributes from the result forming an unique key must be guaranteed to be non-null in the result 16

17  Consider the three relations :  Student(name, deptcode, progcode),  Department(deptcode, deptname)  Program(progcode, progname)  And a query:  SELECT rollno, name, deptname, progname FROM student s INNER JOIN department d ON s.deptcode=d.deptcode INNER JOIN program p ON s.progcode=p.progcode 17

18  Generate mutants by mutating join operator of a single node for all above trees Query Tree 1 Query Tree 2Query Tree 3 18

19 Generated data shows :  A student (Devang) with valid program and department  A student (Abhijeet) with invalid department  A student (Sandeep) with invalid program  A student (Aditya) with invalid program invalid program and department  A program (PhD) with no student  A department (Mechanical) with no student DeptcodeDeptname CSComputer CHChemical MEMechanical ProgcodeProgname 0B.Tech 1M.Tech 2PhD RollnoNameprogcodedeptcode 501Devang1CS 401Abhijeet0CE 701Sandeep5CH 101Aditya4MA DepartmentProgramStudent 19

20 Generated data shows :  A student (Devang) with valid program and department  A program (B.Tech) with no student  A department (Electrical) with no student DeptcodeDeptname CSComputer EEElectrical ProgcodeProgname 0B.Tech 1M.Tech RollnoNameprogcodedeptcode 501Devang1CS DepartmentProgramStudent Foreign Keys are: Student.progcode → Program.progcode Student.deptcode → Department.deptcode 20

21 Case of no foreign keys 21


Download ppt "Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay."

Similar presentations


Ads by Google