Efficient Aggregation over Objects with Extent

Slides:



Advertisements
Similar presentations
The Optimal-Location Query
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Multidimensional Indexing
Searching on Multi-Dimensional Data
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside.
Indexing Network Voronoi Diagrams*
COMP 451/651 Indexes Chapter 1.
2-dimensional indexing structure
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
Chapter 3: Data Storage and Access Methods
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
AALG, lecture 11, © Simonas Šaltenis, Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees and.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
AAU A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing Presented by YuQing Zhang  Slobodan Rasetic Jorg Sander James Elding Mario A.
R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
CSIS7101 – Advanced Database Technologies Spatio-Temporal Data (Part 1) On Indexing Mobile Objects Kwong Chi Ho Leo Wong Chi Kwong Simon Lui, Tak Sing.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University.
Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
Dense-Region Based Compact Data Cube
Advanced Database Aggregation Query Processing
Mehdi Kargar Department of Computer Science and Engineering
Tian Xia and Donghui Zhang Northeastern University
CSE 554 Lecture 5: Contouring (faster)
CS522 Advanced database Systems
Multiway Search Trees Data may not fit into main memory
Progressive Computation of The Min-Dist Optimal-Location Query
CMPS 3130/6130 Computational Geometry Spring 2017
Chapter 25: Advanced Data Types and New Applications
Spatial Indexing.
Chapter 6 Transform-and-Conquer
RE-Tree: An Efficient Index Structure for Regular Expressions
Spatial Indexing I Point Access Methods.
COMP 430 Intro. to Database Systems
B+ Tree.
Orthogonal Range Searching and Kd-Trees
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Spatio-temporal Pattern Queries
Spatial Online Sampling and Aggregation
Joining Interval Data in Relational Databases
Indexing and Hashing Basic Concepts Ordered Indices
Dynamic Data Structures for Simplicial Thickness Queries
Multiway Search Tree (MST)
Combinatorial Optimization of Multicast Key Management
Continuous Motion Pattern Query
Recurrences.
R-trees: An Average Case Analysis
Indexing, Access and Database System Architecture
The Skyline Query in Databases Which Objects are the Most Important?
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Efficient Aggregation over Objects with Extent Donghui Zhang Vassilis J. Tsotras Dimitrios Gunopulos Computer Science Department University of California, Riverside PODS’02 August 4, 2019

Content Motivating examples Problem definition Straightforward approaches Our solutions Performance August 4, 2019

Why Aggregation? Aggregation: compute the total value over a subset of records which satisfy some selection condition (e.g. located in an interesting region). An important operator for data mining, on-line query processing, data warehousing, etc. Data volume is large. With aggregation, user can get a good summary quickly. August 4, 2019

Why Objects with Extent? Many applications (agricultural, meteorological, geo-spatial, etc.) produce data that have spatial (plus temporal) extent. For example, a rainfall record corresponds to a region, not a single point. August 4, 2019

August 4, 2019

Motivating Example A set of rain precipitation records, each having a region and a precipitation value. Given an arbitrary region, what is the total rainfall in this region this month? August 4, 2019

Content Motivating examples Problem definition Straightforward approaches Our solutions Performance August 4, 2019

Problem Definition For simplicity, we focus on rectangular objects and query regions (a complex region can be decomposed to boxes). Two problem variations: simple box-sum and functional box-sum. August 4, 2019

Simple Box-Sum A set of weighted rectangular objects; Given query box q, compute total weight of objects intersecting q. box-sum=3 August 4, 2019

Functional Box-Sum simple box-sum: 4+3 = 7. August 4, 2019

ò Functional Box-Sum In general, object value can be a function. FBS= 20 = - 15 310. d ) 2 ( 7 11 x August 4, 2019

Functional Box-Sum A set of objects, each having a box and a value function. Given query box q, compute the total value of objects intersecting q, where contribution of an object is the integral of its value function over its intersection with q. August 4, 2019

Content Motivating examples Problem definition Straightforward approaches Our solutions Performance August 4, 2019

Straightforward Approach 1 No index. Scan through all objects; Obviously not efficient: query time is O(n). August 4, 2019

Straightforward Approach 2 Index the objects using R-tree [Guttman84]. Reduce to range search. Optimize by storing aggregate information at internal nodes [LM01]. Nevertheless, query time is still O(n). August 4, 2019

Challenge Can we compute the aggregate faster? Our approach: specialized index, query time reduces to log2(n). August 4, 2019

Content Motivating examples Problem definition Straightforward approaches Our solutions Performance August 4, 2019

Content Motivating examples Problem definition Straightforward approaches Our solutions Performance simple box-sum functional box-sum August 4, 2019

Our Solution for Simple Box-Sum We reduce a box-sum query to a set of dominance-sum queries; We propose the BA-tree to answer the dominance-sum query. August 4, 2019

Dominance-Sum A set of weighted point objects; Given query point p, compute total weight of objects dominated by p (i.e. to the lower left of p). August 4, 2019

Dominance-Sum A set of weighted point objects; Given query point p, compute total weight of objects dominated by p (i.e. to the lower left of p). dominance-sum = 18 August 4, 2019

Existing Reduction [EO82] We proved that the reduction technique of [EO82] reduces a d-dimensional box-sum query into dominance-sum queries. ) d 3 Ω( i 2 1 = ÷ ø ö ç è æ å August 4, 2019

Our New Reduction We reduce a d-dimensional simple box-sum query to 2d dominance-sum queries. Comparison: [EO82] Our Reduction d=2 8 4 d=3 26 August 4, 2019

Our New Reduction Key observation: in order for an object o to intersect query box q, the lower-left corner of o must be dominated by the upper-right corner of q. August 4, 2019

August 4, 2019

- - + August 4, 2019

BA-tree (for dominance-sum) 1-dimensional: augmented B+-tree; Along with each child pointer in an index node, store the total weight of points in the sub-tree; Query, update: O(log(n)). August 4, 2019

BA-tree (higher dimensions) augmented k-d-B-tree; k-d-B-tree [Robinson81]: indexes point objects; each index record corresponds to a rectangular region; region of parent is fully partitioned by regions of children. August 4, 2019

k-d-B-tree A August 4, 2019

k-d-B-tree A August 4, 2019

k-d-B-tree A B August 4, 2019

k-d-B-tree R A B August 4, 2019

k-d-B-tree R A B C August 4, 2019

k-d-B-tree R B A C August 4, 2019

k-d-B-tree R B A C August 4, 2019

k-d-B-tree R p Compute dominance-sum regarding point p by examining all children that intersect the rectangle [origin, p]. In this example: A, C, D, E, F, H. August 4, 2019

BA-tree R Motivation for augmentation: examine a single child! p Motivation for augmentation: examine a single child! the rectangle [origin, p] can be divided into four parts... August 4, 2019

BA-tree R p dominated by F’s lower-left corner August 4, 2019

BA-tree R p to the left of F August 4, 2019

BA-tree R p below F August 4, 2019

BA-tree R p intersection with F August 4, 2019

BA-tree R p Compute the total weight of points in these four regions separately and add them up! August 4, 2019

BA-tree R p Total weight of objects in this region: a single value (independent to where p is); augment F with this value (called subtotal). August 4, 2019

BA-tree R p Total weight of objects in this region: computed via a 1-dimensional BA-tree (called y-border) for the y values of all objects to the left of F. August 4, 2019

BA-tree R p Total weight of objects in this region: computed via a 1-dimensional BA-tree (called x-border) for the x values of all objects below F. August 4, 2019

BA-tree R For this part, examine the sub-tree rooted by F. Only one child! thus a single path from root to leaf. August 4, 2019

BA-tree R p Insertion: besides the k-d-B-tree insertion (into sub-tree of C), update subtotal (of F, G), x-border (of B) and y-border (of H). August 4, 2019

BA-tree R p Insertion: besides the k-d-B-tree insertion (into sub-tree of C), update subtotal (of F, G), x-border (of B) and y-border (of H). August 4, 2019

Summary for Our Simple Box-Sum Solution We proposed the BA-tree, a dominance-sum index: a k-d-B-tree augmented with subtotal, x-border and y-border. Due to our reduction, by maintaining several BA-trees together, we can compute the simple box-sum in poly-logarithmic time (assuming a balanced tree). August 4, 2019

Content Motivating examples Problem definition Straightforward approaches Our solutions Performance simple box-sum functional box-sum August 4, 2019

Our Solution for Functional Box-Sum First, focus on a special case: OIFBS (Origin-Involved Functional Box-Sum), where the query box contains the origin of space. August 4, 2019

Functional Box-Sum August 4, 2019

Origin-Involved special case OIFBS=4*(18*5)+3*(2*11)=426. August 4, 2019

Our Solution for Functional Box-Sum We reduce a OIFBS query to a dominance-sum query (solvable using BA-tree). We show how the Functional Box-Sum query can be computed via a set of OIFBS queries. August 4, 2019

OIFBS  dominance-sum Idea: to insert a rectangular object, insert its four corners, associating a function with each corner. The functions should satisfy: for any point p in space, the contribution of each object to the OIFBS is equal to the sum of dominated corners. To compute an OIFBS regarding box [origin, p], compute the dominance-sum regarding p. August 4, 2019

We insert the lower-left corner along with the above function. For any point (x, y) in the object region, the contribution of the object to an OIFBS query at this point is: 4(x-2)(y-10). We insert the lower-left corner along with the above function. August 4, 2019

In general, at the lower-left corner (x1, y1) of object with value f(x, y), we should insert: ò x y f 1 ' d ) , ( The functions at the other corners? August 4, 2019

If (x, y) is to the right of the object, the contribution of the object to OIFBS is: ò x y f 2 1 ' d ) , ( It should equal to the sum of two corners. So the function at lower-right corner is g2-g1. August 4, 2019

ò ò ò ò We have proved that: at lower-left: insert v1=g1 at lower-right: insert v2=g2-g1 at upper-left: insert v3=g3-g1 at upper-right: insert v4= g1+g4-g2-g3 g1= ò x y f 1 ' d ) , ( g2= ò x y f 2 1 ' d ) , ( g3= ò x y f 1 2 ' d ) , ( g4= ò x y f 2 1 ' d ) , ( August 4, 2019

OIFBS  dominance-sum If f(x, y) is a polynomial of degree k, then vi(x, y) are polynomials of degree k+2. e.g. in a previous example, f(x, y)=4, while v1(x, y)=4(x-2)(y-10). Such functions can be represent in constant space and can be combined or evaluated efficiently. August 4, 2019

Functional Box-Sum  OIFBS A functional box-sum query can be transformed into four OIFBS queries. August 4, 2019

= - - + August 4, 2019

Summary for Our Functional Box-Sum Solution We reduced one functional box-sum query to four OIFBS queries; We reduced the OIFBS problem to the dominance-sum problem; Thus we can use the BA-tree for the functional box-sum computation; August 4, 2019

Content Motivating examples Problem definition Straightforward approaches Our solutions Performance simple box-sum functional box-sum August 4, 2019

Experimental Setup Sun Enterprise 250 Server, 8KB page size, 10MB buffer; 6 million random objects, size of each edge: roughly 1/10,000 of space; Our disk-based, dynamic BA-tree can be easily implemented. Implementation can be found at: http://www.cs.ucr.edu/~donghui/boxaggr/ August 4, 2019

Implemented Algorithms The BA-tree has over 200 times faster query performance than the plain R*-tree approach! We report the comparison against the improved aR-tree and omit the plain R*-tree; Two extensions of the dominance-sum data structure ECDF-tree [Bentley80] to disk-based, dynamic update environment. August 4, 2019

Comparing Index Sizes August 4, 2019

Simple Box-Sum Query Cost August 4, 2019

Functional Box-Sum Query Cost August 4, 2019

Conclusions We solved two variations of the box-sum problem; We reduced each variation to dominance-sums and proposed the BA-tree; With about 4 times overhead in space, we achieved 200x query improvement over the R*-tree approach and 30x query improvement over the aR-tree approach. August 4, 2019

Thank you! August 4, 2019