Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University, Germany VLDB 2005, Trondheim Data Management and Exploration Prof. Dr. Thomas Seidl
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Outline Interval-and-Value (IaV) Data and Applications Relational Interval Tree (RI-tree) Managing Interval-and-Value Tuples Using RI-tree Experimental Results
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Contracts table: storing period and budget of contracts CREATE TABLE contracts ( // key: c_noVARCHAR(10), // simple-valued attribute: c_budgetDECIMAL(10,2), // interval: c_periodROW ( c_start DATE, c_end DATE)) Interval-and-Value Data: Example No. Budget (k€) Period StartEnd C C C C C
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Interval-and-Value Data: Query Sample query on contracts table // Find all contracts SELECTc_no FROM contracts // within certain budget range WHEREc_budget BETWEEN 500 AND 2000 // running during certain time interval ANDc_period OVERLAPS (DATE ‘ ’, DATE ‘ ’) Special Cases of this general Range-Interval query: – Value-Interval Query// value range is a single point – Range-Stabbing Query// query interval is a single point – Value-Stabbing Query// both restrictions hold
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Motivation of Relational Indexing Main Memory Structures –no persistency, no disk block structure Secondary Storage Structures +persistency, high block-oriented efficiency –integration into DBMS kernel typically not supported (GiST?) Relational Storage Structures +basic idea: don‘t extend, just use RDBMS (virtual storage machine) +sound formal fundament, little implementation effort +immediate industrial strength (availability, robustness, ACID, …) +high efficiency by exploiting built-in indexing structures (B + -tree) Disk No DB SQL
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB , 15, C1 12, 10, C1 8, 13, C2 12, 15, C1 8, 5, C2 12, 10, C1 4, 7, C3 8, 13, C2 12, 15, C1 4, 1, C3 8, 5, C2 12, 10, C1 4, 7, C3 8, 13, C2 8, 15, C4 12, 15, C1 4, 1, C3 8, 3, C4 8, 5, C2 12, 10, C1 Two relational indexes (B + -trees) store the interval bounds lowerIndex (node,start,id): upperIndex (node,end,id): Supported by any RDBMS: No modification of built-in B+-trees Optimal complexities for space, updates, and intersection queries Relational Interval Tree C C3 C2 C root = 2 h-1 [Kriegel, Pötke, Seidl: VLDB 2000] based on [Edelsbrunner 1980]
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Single Interval Query Processing Two steps to process an interval query 1.Transform interval query into a set of range queries –The generated queries are collected in transient tables (no I/Os) 2.Perform a single SQL query –Join the transient query tables with the relational indexes start end
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Preprocessing: Generate Query Ranges Generate a set of range queries for lowerIndex and upperIndex (32,48,52) –At nodes left of start: report entries i with i.end start (32,48,52) (56) –At nodes right of end: report entries i with i.start end (56) ( ) –For nodes between start and end: report all entries ( ) start end upperIndex lowerIndex 5654 to
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Processing by a Single SQL Query Join transient query tables with B+-tree indexes SELECT id FROM upperIndex AS i JOIN :leftQueries USING (node) WHERE i.end >= :start UNION ALL SELECT id FROM lowerIndex AS i JOIN :rightQueries USING (node) WHERE i.start <= :end UNION ALL SELECT id FROM lowerIndex // or upperIndex WHERE node BETWEEN :start AND :end No duplicates are produced → UNION ALL Blocked output of index range scans is guaranteed
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Extending the RI-tree for IaV Support (1) Add value predicate to RI-tree query SELECT id// lower subquery FROM upperIndex AS i JOIN :leftQueries USING (node) WHERE i.end >= :start AND i.value BETWEEN :Value1 and :Value2 UNION ALL... // upper subquery UNION ALL SELECT id// inner subquery FROM lowerIndex // or upperIndex WHERE node BETWEEN :start AND :end AND value BETWEEN :Value1 and :Value2 Integrate simple value attribute into lower-/upperIndex –old schema: (node, bound, id) –new schema: ? → depends on type of query to support
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Extending the RI-tree for IaV Support (2) Viable schemas for new lower-/upperIndexes –(value, node, bound, id) –(node, value, bound, id)estimate access cost for each query type –(node, bound, value, id) Observations (see paper for details): –Value queries best supported by (value, node, bound, id) index simple attribute predicates = point queries evaluation requires same number of disk accesses as original proceeding –Range Queries: choice of index not obvious inner subquery of Range-Stabbing Queries best supported by (node, value, bound, id) otherwise: depends on stored data and values of query variables Question: Can Range Queries be further enhanced?
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Improving Range Query Processing (1) Problem of composite indexes for multiple attributes –queries may contain range predicates on two or more of the indexed attributes –tuples satisfying first predicate lie in contiguous disk area –tuples satisfying both/all predicates are scattered within this area Common solution: using space-filling curves –mapping multi-dimensional data to one-dimensional values –similar values of original data are mapped on similar index data –ranges of indexed attributes will be found in adjacent disk areas Application on RI-tree scenario –combining some attributes of lower-/upperIndex –depends on type of query to support
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Improving Range Query Processing (2) Identifying viable schemas for new lower-/upperIndexes –find subqueries containing several range predicates for Range Queries: lower and upper subqueries (bound, value) for Range-Interval Queries: inner subquery (node, value) –combine respective attributes (x,y) within space-filling curve {x,y} –useful combinations for lower-/upperIndex: (node, {value, bound}) ({node, value}, bound)
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Improving Range Query Processing (3) Observations: –lower and upper subqueries of Range Queries will profit by a (node, {value, bound}) index –inner subquery of Range-Interval Queries will profit by a ({node, value}, bound) index –Value Queries will not profit by “space-filling indexes” Intermediate result –space-filling indexes can reduce disk accesses in certain cases –there is no “universal” index supporting all queries to the same extent –different subqueries will profit by different indexes
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Identifying best indexes for each query type –Value Queries: best supported by (value, node, bound, id) index –Range Queries: depends on data and space-filling curve (if used) different subqueries best supported by different indexes subqueries may be evaluated separately using best index drawback: higher cost for index updates and storage requirements Employing index mixes QueriesLower/Upper SubqueryInner Subquery Value-Stabbing(value, node, bound) Value-Interval(value, node,bound) Range-Stabbing(node, {value, bound})(node, value, bound) Range-Interval(node, {value, bound})({node, value})
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Adapting the RI-tree Algorithms (1) Example: Evaluate a contracts query using „space-filling index“ Contracts table: –Node and Z-order value calculated for each tuple –B-tree index on (node, Z(budget, start), no) No. Budget (k€) Period NodeZ(budget, start) StartEnd C C C C C
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Range-Interval Query: value range (1,12); interval (3,6) Adapting the RI-tree Algorithms (2) Evaluation of upper subquery with Z-order index
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Access Cost with Varying Table Sizes Value-Stabbing QueriesValue-Interval Queries
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Access Cost with Varying Table Sizes Range-Stabbing QueriesRange-Interval Queries
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Access cost for varying length of ranges Stabbing QueriesInterval Queries
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Access cost for varying length of ranges Range Queries
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Comparison with competing techniques
Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Conclusions Processing Interval-and-Value Tuples in SQL databases Extensions of the Relational Interval Tree Various types of queries –Range vs. Value Queries –Interval vs. Stabbing Queries Experiments demonstrate high performance Future work: –Extend proposed techniques to more complex queries (joins) –Cost models to predict benefits for evolving query workload