Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,


Similar presentations
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside.
Presented by Vigneshwar Raghuram
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
1 Overview of Storage and Indexing Chapter 8 (part 1)
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Physical Database Monitoring and Tuning the Operational System.
Chapter 3: Data Storage and Access Methods
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, Thomas Seidl A Cost Model for Interval Intersection Queries on RI-Trees Institute for Computer Science.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Kyoto, 03/26/03 Kyoto, 03/26/03 Martin Pfeifle, Database Group, University of Munich Spatial Query Processing for High Resolutions Hans-Peter Kriegel,
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Fast Set Intersection in Memory Bolin Ding Arnd Christian König UIUC Microsoft Research.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
CSCE Database Systems Chapter 15: Query Execution 1.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University.
CS4432: Database Systems II Query Processing- Part 2.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Multidimensional Access Methods Ho Hoang Nguyen Nguyen Thanh Trong Dao Vu Quoc Trung Ngo Phuoc Huong Thien DATABASE.
A History and Evaluation of System R Mosharaf Chowdhury EECS 582 – W1611/13/16.
Presenters : Virag Kothari,Vandana Ayyalasomayajula Date: 04/21/2010.
Module 11: File Structure
CPS216: Data-intensive Computing Systems
Temporal Indexing MVBT.
COMP 430 Intro. to Database Systems
Evaluation of Relational Operations
Evaluation of Relational Operations: Other Operations
Joining Interval Data in Relational Databases
Lecture 2- Query Processing (continued)
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Overview of Query Evaluation
Evaluation of Relational Operations: Other Techniques
Chapter 11: Indexing and Hashing
Evaluation of Relational Operations: Other Techniques
Efficient Aggregation over Objects with Extent
Presentation transcript:

Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University, Germany VLDB 2005, Trondheim Data Management and Exploration Prof. Dr. Thomas Seidl

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Outline Interval-and-Value (IaV) Data and Applications Relational Interval Tree (RI-tree) Managing Interval-and-Value Tuples Using RI-tree Experimental Results

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Contracts table: storing period and budget of contracts CREATE TABLE contracts ( // key: c_noVARCHAR(10), // simple-valued attribute: c_budgetDECIMAL(10,2), // interval: c_periodROW ( c_start DATE, c_end DATE)) Interval-and-Value Data: Example No. Budget (k€) Period StartEnd C C C C C

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Interval-and-Value Data: Query Sample query on contracts table // Find all contracts SELECTc_no FROM contracts // within certain budget range WHEREc_budget BETWEEN 500 AND 2000 // running during certain time interval ANDc_period OVERLAPS (DATE ‘ ’, DATE ‘ ’) Special Cases of this general Range-Interval query: – Value-Interval Query// value range is a single point – Range-Stabbing Query// query interval is a single point – Value-Stabbing Query// both restrictions hold

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Motivation of Relational Indexing Main Memory Structures –no persistency, no disk block structure Secondary Storage Structures +persistency, high block-oriented efficiency –integration into DBMS kernel typically not supported (GiST?) Relational Storage Structures +basic idea: don‘t extend, just use RDBMS (virtual storage machine) +sound formal fundament, little implementation effort +immediate industrial strength (availability, robustness, ACID, …) +high efficiency by exploiting built-in indexing structures (B + -tree) Disk No DB SQL

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB , 15, C1 12, 10, C1 8, 13, C2 12, 15, C1 8, 5, C2 12, 10, C1 4, 7, C3 8, 13, C2 12, 15, C1 4, 1, C3 8, 5, C2 12, 10, C1 4, 7, C3 8, 13, C2 8, 15, C4 12, 15, C1 4, 1, C3 8, 3, C4 8, 5, C2 12, 10, C1 Two relational indexes (B + -trees) store the interval bounds lowerIndex (node,start,id): upperIndex (node,end,id): Supported by any RDBMS: No modification of built-in B+-trees Optimal complexities for space, updates, and intersection queries Relational Interval Tree C C3 C2 C root = 2 h-1 [Kriegel, Pötke, Seidl: VLDB 2000] based on [Edelsbrunner 1980]

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Single Interval Query Processing Two steps to process an interval query 1.Transform interval query into a set of range queries –The generated queries are collected in transient tables (no I/Os) 2.Perform a single SQL query –Join the transient query tables with the relational indexes start end

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Preprocessing: Generate Query Ranges Generate a set of range queries for lowerIndex and upperIndex (32,48,52) –At nodes left of start: report entries i with i.end  start (32,48,52) (56) –At nodes right of end: report entries i with i.start  end (56) ( ) –For nodes between start and end: report all entries ( ) start end upperIndex lowerIndex 5654 to

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Processing by a Single SQL Query Join transient query tables with B+-tree indexes SELECT id FROM upperIndex AS i JOIN :leftQueries USING (node) WHERE i.end >= :start UNION ALL SELECT id FROM lowerIndex AS i JOIN :rightQueries USING (node) WHERE i.start <= :end UNION ALL SELECT id FROM lowerIndex // or upperIndex WHERE node BETWEEN :start AND :end No duplicates are produced → UNION ALL Blocked output of index range scans is guaranteed

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Extending the RI-tree for IaV Support (1) Add value predicate to RI-tree query SELECT id// lower subquery FROM upperIndex AS i JOIN :leftQueries USING (node) WHERE i.end >= :start AND i.value BETWEEN :Value1 and :Value2 UNION ALL... // upper subquery UNION ALL SELECT id// inner subquery FROM lowerIndex // or upperIndex WHERE node BETWEEN :start AND :end AND value BETWEEN :Value1 and :Value2 Integrate simple value attribute into lower-/upperIndex –old schema: (node, bound, id) –new schema: ? → depends on type of query to support

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Extending the RI-tree for IaV Support (2) Viable schemas for new lower-/upperIndexes –(value, node, bound, id) –(node, value, bound, id)estimate access cost for each query type –(node, bound, value, id) Observations (see paper for details): –Value queries best supported by (value, node, bound, id) index simple attribute predicates = point queries evaluation requires same number of disk accesses as original proceeding –Range Queries: choice of index not obvious inner subquery of Range-Stabbing Queries best supported by (node, value, bound, id) otherwise: depends on stored data and values of query variables Question: Can Range Queries be further enhanced?

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Improving Range Query Processing (1) Problem of composite indexes for multiple attributes –queries may contain range predicates on two or more of the indexed attributes –tuples satisfying first predicate lie in contiguous disk area –tuples satisfying both/all predicates are scattered within this area Common solution: using space-filling curves –mapping multi-dimensional data to one-dimensional values –similar values of original data are mapped on similar index data –ranges of indexed attributes will be found in adjacent disk areas Application on RI-tree scenario –combining some attributes of lower-/upperIndex –depends on type of query to support

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Improving Range Query Processing (2) Identifying viable schemas for new lower-/upperIndexes –find subqueries containing several range predicates for Range Queries: lower and upper subqueries (bound, value) for Range-Interval Queries: inner subquery (node, value) –combine respective attributes (x,y) within space-filling curve {x,y} –useful combinations for lower-/upperIndex: (node, {value, bound}) ({node, value}, bound)

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Improving Range Query Processing (3) Observations: –lower and upper subqueries of Range Queries will profit by a (node, {value, bound}) index –inner subquery of Range-Interval Queries will profit by a ({node, value}, bound) index –Value Queries will not profit by “space-filling indexes” Intermediate result –space-filling indexes can reduce disk accesses in certain cases –there is no “universal” index supporting all queries to the same extent –different subqueries will profit by different indexes

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Identifying best indexes for each query type –Value Queries: best supported by (value, node, bound, id) index –Range Queries: depends on data and space-filling curve (if used) different subqueries best supported by different indexes subqueries may be evaluated separately using best index drawback: higher cost for index updates and storage requirements Employing index mixes QueriesLower/Upper SubqueryInner Subquery Value-Stabbing(value, node, bound) Value-Interval(value, node,bound) Range-Stabbing(node, {value, bound})(node, value, bound) Range-Interval(node, {value, bound})({node, value})

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Adapting the RI-tree Algorithms (1) Example: Evaluate a contracts query using „space-filling index“ Contracts table: –Node and Z-order value calculated for each tuple –B-tree index on (node, Z(budget, start), no) No. Budget (k€) Period NodeZ(budget, start) StartEnd C C C C C

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Range-Interval Query: value range (1,12); interval (3,6) Adapting the RI-tree Algorithms (2) Evaluation of upper subquery with Z-order index

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Access Cost with Varying Table Sizes Value-Stabbing QueriesValue-Interval Queries

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Access Cost with Varying Table Sizes Range-Stabbing QueriesRange-Interval Queries

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Access cost for varying length of ranges Stabbing QueriesInterval Queries

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Access cost for varying length of ranges Range Queries

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Comparison with competing techniques

Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB Conclusions Processing Interval-and-Value Tuples in SQL databases Extensions of the Relational Interval Tree Various types of queries –Range vs. Value Queries –Interval vs. Stabbing Queries Experiments demonstrate high performance Future work: –Extend proposed techniques to more complex queries (joins) –Cost models to predict benefits for evolving query workload