Advanced Database Aggregation Query Processing

Slides:



Advertisements
Similar presentations
The Optimal-Location Query
Advertisements

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside.
Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.
1 Lecture 8: Data structures for databases II Jose M. Peña
Temporal Indexing Snapshot Index. Transaction Time Environment Assume that when an event occurs in the real world it is inserted in the DB A timestamp.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Spatio-Temporal Databases
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
A survey on stream data mining
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
ME 2304: 3D Geometry & Vector Calculus Dr. Faraz Junejo Double Integrals.
Section 6.1 Area Between Two Curves. All graphics are attributed to:  Calculus,10/E by Howard Anton, Irl Bivens, and Stephen Davis Copyright © 2009 by.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
2IL50 Data Structures Fall 2015 Lecture 9: Range Searching.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University.
Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University.
Physical Database Design I, Ch. Eick 1 Physical Database Design I Chapter 16 Simple queries:= no joins, no complex aggregate functions Focus of this Lecture:
CS240A: Databases and Knowledge Bases Temporal Databases Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Presenters : Virag Kothari,Vandana Ayyalasomayajula Date: 04/21/2010.
Data Modeling Using the Entity- Relationship (ER) Model
CSE 554 Lecture 5: Contouring (faster)
Spatio-Temporal Databases
Module 11: File Structure
CS522 Advanced database Systems
Record Storage, File Organization, and Indexes
Progressive Computation of The Min-Dist Optimal-Location Query
CMPS 3130/6130 Computational Geometry Spring 2017
On-Line Analytic Processing
School of Computing Clemson University Fall, 2013
Chapter 25: Advanced Data Types and New Applications
A paper on Join Synopses for Approximate Query Answering
The Variable-Increment Counting Bloom Filter
Informix Red Brick Warehouse 5.1
Binary and Ternary Search
Overview of Query Optimization
Orthogonal Range Searching and Kd-Trees
Spatio-temporal Pattern Queries
EFFICIENT RANGE QUERY PROCESSING ON UNCERTAIN DATA
Chapter 11: Indexing and Hashing
Dynamic Programming.
Introduction to Spatial Databases
Spatio-Temporal Databases
Joining Interval Data in Relational Databases
Reporting (1-D) Given a set of points S on the line, preprocess them to build structure that allows efficient queries of the from: Given an interval I=[x1,x2]
Dual Bitmap Index: Space-Time Efficient Bitmap
Database Design and Programming
Research on Personal Dataspace Management
ICOM 5016 – Introduction to Database Systems
15-826: Multimedia Databases and Data Mining
CMPS 3130/6130 Computational Geometry Spring 2017
Chapter 11: Indexing and Hashing
The Skyline Query in Databases Which Objects are the Most Important?
Efficient Processing of Top-k Spatial Preference Queries
Donghui Zhang, Tian Xia Northeastern University
(Finding area using integration)
Efficient Aggregation over Objects with Extent
New Perspectives on Microsoft
Presentation transcript:

Advanced Database Aggregation Query Processing Donghui Zhang Computer Science Department University of California, Riverside 3/28/2002 EDBT Ph.D. Workshop 2002

Aggregation Problem Maintain a set of objects, each having a value. Given a condition which holds for a sub-set of objects, compute the total value of objects in this sub-set. E.g. “find the total salary of employees who joined the company less than a year”. 3/28/2002 EDBT Ph.D. Workshop 2002

Aggregation over Objects with Extent Objects with extent: versus point objects. Real-life applications: temporal, spatial, etc. An employee works for the company during a certain period of time; “find the total salary of employees who worked for the company during 1999”. A rainfall record occurs within a spatial region; “find the total volume of rainfall in Los Angeles”. 3/28/2002 EDBT Ph.D. Workshop 2002

Functional Box-Sum Maintain a set of objects, each having a box and a value function; given query box q, compute the total value of objects intersecting q, where the contribution of an object is the integral of its value function over its intersection with q. 3/28/2002 EDBT Ph.D. Workshop 2002

Functional Box-Sum functional box-sum: 4*50+3*12 = 236. 3/28/2002 EDBT Ph.D. Workshop 2002

ò Functional Box-Sum = - 310. d ) 2 ( 7 11 x Moreover, object value can be a function; FBS= ò 20 = - 15 310. d ) 2 ( 7 11 x 3/28/2002 EDBT Ph.D. Workshop 2002

Straightforward Approaches No index. For each query, scan through all records. Not efficient. Maintain the objects in an R-tree (which speeds up the selection query). To compute an aggregate, select the objects and aggregate their values on-the-fly. Query time: O(n). 3/28/2002 EDBT Ph.D. Workshop 2002

Our Solution We reduce the functional box-sum problem into a simpler problem (the dominance-sum problem) and we build an index specialized for computing the dominance-sums. Instead of storing the original data, the specialized index stores specially aggregated information, which leads to O(log2n) query time. 3/28/2002 EDBT Ph.D. Workshop 2002

Functional Box-Sum  OIFBS A special case of functional box-sum is OIFBS (Origin-Involved Functional Box-Sum), where the query box contains the origin of space. A functional Box-Sum query can be reduced to the OIFBS: we compute the OIFBS from origin to upper right corner of the query box, then subtract the parts to the left and below the query box (which are also OIFBS queries). 3/28/2002 EDBT Ph.D. Workshop 2002

Dominance-Sum = 18 Maintain a set of weighted points; Given query point p, compute total weight of objects dominated by p (i.e. to the lower left of p). dominance-sum = 18 3/28/2002 EDBT Ph.D. Workshop 2002

OIFBS  Dominance-Sum Idea: to insert an object (with a rectangular region), insert its corner points, associating a function with each corner. To compute an OIFBS regarding box [origin, p], compute the dominance-sum regarding p, i.e. the summation of functions associated with points dominated by p. 3/28/2002 EDBT Ph.D. Workshop 2002

New Dominance-Sum Index For the dominance-sum problem, we propose the BA-tree: a k-d-B-tree augmented with additional information at index records. O(log2n) query time, when balanced. 3/28/2002 EDBT Ph.D. Workshop 2002

Performance Functional box-sum query cost 3/28/2002 EDBT Ph.D. Workshop 2002

Summary of Our Aggregation Work The functional box-sum solution described here is to appear in [PODS’02]. Also in [PODS’02], we solved a variation: a simple box-sum aggregation problem, which is to find the total value of objects intersecting the query rectangle. We solved some other aggregation problems... 3/28/2002 EDBT Ph.D. Workshop 2002

Range-Temporal Aggregation Maintain a set of temporal records, each having a key, a value and a time interval. Given a key range r and time interval i, compute the total value of records whose keys are in r and whose intervals intersect i. Appeared in [PODS’01]. 3/28/2002 EDBT Ph.D. Workshop 2002

Temporal Aggregation over Data Streams Temporal aggregation in the circumstance when records accumulate in a streaming manner. There is limited storage, but we want to answer aggregation queries both for recent data and for older data. To appear in [EDBT’02]. 3/28/2002 EDBT Ph.D. Workshop 2002

Box-Max Aggregation Maintain a set of spatial objects, each having a spatial region and a value. Given a query region r, find the Min/Max value over all objects intersecting r. Appeared in [GIS’01]. 3/28/2002 EDBT Ph.D. Workshop 2002

Conclusions We have proposed specialized index structures for various complex aggregation problems. In all cases, our proposed methods have much better query performance than the existing approaches, sometimes over 100 times faster. We recommend that these indices should be implemented in commercial DBMS in circumstances when the aggregates need to be computed very fast. 3/28/2002 EDBT Ph.D. Workshop 2002