Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University.

Slides:



Advertisements
Similar presentations
The Optimal-Location Query
Advertisements

Index Dennis Shasha and Philippe Bonnet, 2013.
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Temporal Databases S. Srinivasa Rao April 12, 2007
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Indexing and Range Queries in Spatio-Temporal Databases
Hashing and Indexing John Ortiz.
TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
Temporal Indexing Snapshot Index. Transaction Time Environment Assume that when an event occurs in the real world it is inserted in the DB A timestamp.
Spatio-Temporal Databases
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561.
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
1 Physical Data Organization and Indexing Lecture 14.
AAU A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing Presented by YuQing Zhang  Slobodan Rasetic Jorg Sander James Elding Mario A.
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
Introduction to SQL Steve Perry
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
DIST: A Distributed Spatio-temporal Index Structure for Sensor Networks Anand Meka and Ambuj Singh UCSB, 2005.
Creating Databases for web applications [Complete presentations] More SQL Class time: discuss final projects. Do posting if you have not done it.
Efficient Complex Query Support For Multi-version XML Documents Shu-Yao Chien Dept. of CS UCLA Vassilis J. Tsotras Dept. of CS&E UC Riverside.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Physical Database Design I, Ch. Eick 1 Physical Database Design I Chapter 16 Simple queries:= no joins, no complex aggregate functions Focus of this Lecture:
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
1 Review of report "LSDX: A New Labeling Scheme for Dynamically Updating XML Data"
Setting up a search engine KS 2 Search: appreciate how results are selected.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
Presenters : Virag Kothari,Vandana Ayyalasomayajula Date: 04/21/2010.
Advanced Database Aggregation Query Processing
A Case Study in Building Layered DHT Applications
Spatio-Temporal Databases
Module 11: File Structure
CS522 Advanced database Systems
Chapter 25: Advanced Data Types and New Applications
Temporal Indexing MVBT.
Temporal Indexing MVBT.
Orthogonal Range Searching and Kd-Trees
Spatio-temporal Pattern Queries
Spatio-Temporal Databases
Joining Interval Data in Relational Databases
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Presentation transcript:

Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University of California, Riverside ** Philipps Universität Marburg, Germany

Outline Introduction & Motivation Problem Decomposition The MVSB-tree Performance Results Conclusions

Introduction & Motivation Consider a collection of temporal records. Each record: key k, value v, time interval [t 1, t 2 ]. E.g.: employees and their salaries over time. Temporal Aggregation: aggregate values over time. Focus on SUM/COUNT/AVG. Introduction & Motivation

Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation

Previous Work ‘Given interval [t 1, t 2 ], aggregate over all records that intersect [t 1, t 2 ]’. (SB-tree [YW01]) E.g. the sum at t 2 is 13. ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation

Previous Work E.g. the sum over [t 1, t 2 ] is 28. ‘Given interval [t 1, t 2 ], aggregate over all records that intersect [t 1, t 2 ]’. (SB-tree [YW01]) E.g. the sum at t 2 is 13. ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation

Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t 1, t 2 ] with keys in range [k 1, k 2 ]’. E.g. the RTA-sum over [k 1, k 2 ]x[t 1, t 2 ] is 19. Introduction & Motivation

Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t 1, t 2 ] with keys in range [k 1, k 2 ]’. E.g. the RTA-sum over [k 1, k 2 ]x[t 1, t 2 ] is 19. Introduction & Motivation Find AVG salary over past ten years of all employees whose last names start with ‘B’.

Alternative: Introduction & Motivation Previous approaches would need a separate index for each possible key range. (inefficient) Our solution: O(log b n). -index the records; -selection query: ‘find all records intersecting [k 1, k 2 ]x [t 1, t 2 ]’. -Query time is O(n).

Problem Decomposition LKST query: given k, t, aggregate over all records with keys less than k and intervals containing t. Problem Decomposition Decompose RTA into LKST and LKLT queries. E.g. LKST(k 2, t 2 )=11.

LKLT query: given k, t, aggregate over all records with keys less than k and intervals ending before t. Problem Decomposition E.g. LKLT(k 2, t 2 )=20.

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ])

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) - LKST(k 1, t 2 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) - LKLT(k 1, t 2 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )LKLT(k 2, t 1 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )- LKLT(k 1, t 1 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )LKLT(k 2, t 1 )- LKLT(k 1, t 1 )

RTA([k 1, k 2 ]x[t 1, t 2 ]) =LKST(k 2, t 2 )- LKST(k 1, t 2 ) + LKLT(k 2, t 2 )- LKLT(k 1, t 2 ) - LKLT(k 2, t 1 )+ LKLT(k 1, t 1 ) The RTA query is decomposed to LKST and LKLT. Problem Decomposition

Both LKST and LKLT are point queries: ‘given k, t, return value’. An index for LKST and LKLT should:  store points in key-time space;  maintain a value for each point;  support point queries. Index Design

Model Assume updates come in increasing time order (transaction-time model). at t 1, inserted as: at t 2, updated as: Index Design a record:

The LKST index at t 1 The effect of inserting record (k, [t 1, t 2 ], v): at t 2 Index Design

The LKLT index no update at t 1 Index Design The effect of inserting record (k, [t 1, t 2 ], v): at t 2

Update Operation Common update operation for both: insert (k, t):v. Index Design That is: add v to all points in [k, t] x [k max, t max ]. Conclusion: an index supporting point query and the above update can be used for LKLT and LKST.

The MVSB-tree A partially persistent SB-tree. It inherits features from both the SB-tree [YW01] and the MVBT [BGO+96]. The MVSB-tree

Insertion The MVSB-tree

Insertion (cont.) The MVSB-tree To handle overflow, copy records with end=t max to a new page.

Insertion (cont.) The MVSB-tree To handle overflow, copy records with end=t max to a new page. copy Strong overflow: limit the number of records in a new page. root 2 : [4, t max ) root 1 : [1, 4)

Point Query (k, t ) Follows a single path: the nodes containing (k, t ). Aggregates the values found in this path. The MVSB-tree

Point Query (k, t ) Follows a single path: the nodes containing (k, t ). The MVSB-tree E.g.: PointQuery(23, 7) = 5+2 = 7. Aggregates the values found in this path.

Efficiency Theorem: with 2 MVSBT indices, we achieve:  RTA query: O(log b n);  Update: O(log b K);  Space: O( * log b K). n = number of updates; K= number of different keys; b = page capacity (in records). The MVSB-tree

Performance Results Sun Enterprize 250 Server; two 300 Mhz Ultra SPARC-II processors; Solaris 2.8; GNU C++; Datasets: created using the TimeIT [KS98] software and transformed to add record keys. Each dataset has a million records (10k unique keys; on average 100 intervals per key). Compare against the straightforward approach using the MVBT [BGO+96] as temporal index. Performance Results

Index Sizes Performance Results

Query Speedup Query time is averaged over 100 queries of the same query rectangle size.

Conclusions We addressed the range-temporal aggregation (RTA) problem; New index structure (MVSB-tree) for incrementally maintaining and efficiently computing RTAs; Query time reduced from O(n) to O(log b n) with small space overhead; Open problems:  Min/Max range-temporal aggregation;  Valid-time environment;  Multi-dimensional aggregation over objects with extents.