32nd International Conference on Very Large Data Bases September 12 - 15, 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J.

32nd International Conference on Very Large Data Bases September 12 - 15, 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J. Watson Research Center luog@us.ibm.com

2 Empty Result Problem Query returns an empty result set User gets lost about where to look at next Frequently encountered in interactive exploration of massive data sets Our contribution: method for quickly detecting empty result sets

3 Example Percentages of Empty Result Queries In a Customer Relationship Management (CRM) application developed by IBM –18.07% (3,396 empty result queries in 18,793 queries) In a real estate application developed by IBM –5.75% In a digital library application [JCM + 00] –10.53% In a bioinformatics application [RCP + 98] –38%

4 Empty Result Queries May Not Finish Execution Quickly Consider a query joining two relations –Query execution time is longer than join time, no matter whether or not query result set is empty Even if a query finishes in a few seconds in a lightly loaded RDBMS, it may last longer than one minute in a heavily loaded RDBMS

5 Outline Limitations of previous approaches Fast detection method for empty result queries Some experiments

6 Existing Solutions to the Empty Result Problem Explain what leads to the empty result set Automatically generalize the query so that the generalized query will return some answers

7 Limitations of Existing Solutions Require domain specific knowledge Only apply to a restricted form of queries Require an excessive amount of time Give too many reasons why the result set is empty Users cannot reuse each other’s query results

9 Our Solution Only consider read-only environment From previous queries’ execution, remember the query parts that lead to empty result sets When a new query Q comes, match it with the remembered query parts. If such a match exists, report that Q will return an empty result set without executing Q Utilize special properties of empty result sets and thus often more powerful than traditional materialized view method

10 Definitions Empty result propagating operator: An operator whose output is empty if any input is empty Empty result propagating query: A query whose query plan only contains empty result propagating operators (our focus) Query part: A sub-tree of a query plan Atomic query part: An ordered pair (relation names R N, selection condition S C ) –Corresponds to a relational algebra formula: first product join all relations in R N, then apply S C –S C is a conjunction of primitive terms, where each primitive term is a comparison

11 Definitions – Cont. Cover: Atomic query part P 1 =(R N1, S C1 ) covers atomic query part P 2 =(R N2, S C2 ) if –R N1  R N2 –Whenever S C2 is true, S C1 is true Property: Suppose atomic query part P 1 covers atomic query part P 2. For a given database, if the output of P 1 is empty, the output of P 2 is also empty.

12 Given an Empty Result Query Find the lowest-level query part P whose output is empty B (index-scan) B.e<40  B.e=50 [5000] C (table-scan) [20000] sort-merge join B.g=C.h [0]  C.f<300 [1000]  [0] sort [0] sort [1000]  [0] A (table-scan) [40000]  50<A.a<100  A.b=200 [200]  [5000] hash join A.c=B.d [0] hash [200] hash [5000]

13 Transforming P into a Simplified Query Part P s Drop all operators (e.g., projection, hash, sort) that have no influence on the emptiness of the output Replace each physical join operator with a logical join operator Replace each index-scan operator with a table-scan operator followed by a selection operator, where the selection condition is the index-scan condition

14 Transforming P into a Simplified Query Part P s – Cont. Corresponding relational algebra formula –(  50<A.a<100  A.b=200 (A)) ⋈ A.c=B.d (  B.e<40  B.e=50 (B)) B (table-scan) A (table-scan)  50<A.a<100  A.b=200  B.e<40  B.e=50 ⋈ A.c=B.d

15 Breaking P s into Atomic Query Parts Get all selection conditions in the selection/join operators Rewrite the conjunction of these selection conditions into a disjunctive normal form (DNF) –Negations on numeric or string attributes are removed using complementary operators –Interval-based comparison is treated as a single primitive term Generate a set of atomic query parts ( R N, S C ) –R N : input relations of all table-scan operators in P s –S C : a term in the DNF

16 Breaking P s into Atomic Query Parts – Cont. Property: The following three assertions are equivalent to each other: –The output of the query part P is empty –The output of the simplified query part P s is empty –The output of each generated atomic query part is empty (  50<A.a<100 (A)) ⋈ A.c=B.d (  B.e<40 (B)) (  A.b=200 (A)) ⋈ A.c=B.d (  B.e<40 (B)) (  50<A.a<100 (A)) ⋈ A.c=B.d (  B.e=50 (B)) (  A.b=200 (A)) ⋈ A.c=B.d (  B.e=50 (B))

17 Storing the Generated Atomic Query Parts For each generated atomic query part P a –Insert P a into a collection C aqp of atomic query parts –Remove from C aqp all previously stored atomic query parts that are covered by P a See paper for details of the coverage checking algorithm

18 When Getting a New Query Q Break Q into a set of atomic query parts For each such atomic query part P a, check whether some atomic query part A i in C aqp covers P a If such an A i exists for each P a, report that Q will return an empty result set without executing Q

20 Setup Testing environment –PostgreSQL 7.3.4 –Windows XP OS –Dell Inspiron 8500 PC with one 2.2GHz CPU, 512MB memory, one 40GB disk TPC-R benchmark See paper for detection probability analysis

21 Overhead Experiment Query Q 1 : Find the information about certain parts that were sold on certain days select * from orders o, lineitem l where o.orderkey=l.orderkey and (o.orderdate=d 1 or … or o.orderdate=d e ) and (l.partkey=p 1 or … or l.partkey=p f );

22 Overhead Experiment – Cont. Query Q 2 : Find the information about certain parts that were sold to certain customers on certain days select * from orders o, lineitem l, customer c where o.orderkey=l.orderkey and o.custkey=c.custkey and (o.orderdate=d 1 or … or o.orderdate=d e ) and (l.partkey=p 1 or … or l.partkey=p f ) and (c.nationkey=n 1 or … or c.nationkey=n g );

23 Overhead Experiment – Cont. The overhead of our method increases with both query complexity and the number of atomic query parts stored in C aqp When check fails, the overhead of our method is higher than that when check succeeds

24 Overhead Experiment – Cont. The overhead of our method is trivial compared to query execution overhead

25 Summary Provide a fast detection method for empty result queries –Low overhead –High detection probability once enough information has been accumulated

26 Open Issues In the presence of update, correctly preserve as much stored information as possible A hybrid method that can combine the advantages of both our method and the existing solutions More aggressive storage saving technique

32nd International Conference on Very Large Data Bases September 12 - 15, 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J.

Similar presentations

Presentation on theme: "32nd International Conference on Very Large Data Bases September 12 - 15, 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

32nd International Conference on Very Large Data Bases September 12 - 15, 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J.

Similar presentations

Presentation on theme: "32nd International Conference on Very Large Data Bases September 12 - 15, 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J."— Presentation transcript:

Similar presentations

About project

Feedback