Efficient Aggregation over Objects with Extent Donghui Zhang Vassilis J. Tsotras Dimitrios Gunopulos Computer Science Department University of California, Riverside PODS’02 August 4, 2019
Content Motivating examples Problem definition Straightforward approaches Our solutions Performance August 4, 2019
Why Aggregation? Aggregation: compute the total value over a subset of records which satisfy some selection condition (e.g. located in an interesting region). An important operator for data mining, on-line query processing, data warehousing, etc. Data volume is large. With aggregation, user can get a good summary quickly. August 4, 2019
Why Objects with Extent? Many applications (agricultural, meteorological, geo-spatial, etc.) produce data that have spatial (plus temporal) extent. For example, a rainfall record corresponds to a region, not a single point. August 4, 2019
August 4, 2019
Motivating Example A set of rain precipitation records, each having a region and a precipitation value. Given an arbitrary region, what is the total rainfall in this region this month? August 4, 2019
Content Motivating examples Problem definition Straightforward approaches Our solutions Performance August 4, 2019
Problem Definition For simplicity, we focus on rectangular objects and query regions (a complex region can be decomposed to boxes). Two problem variations: simple box-sum and functional box-sum. August 4, 2019
Simple Box-Sum A set of weighted rectangular objects; Given query box q, compute total weight of objects intersecting q. box-sum=3 August 4, 2019
Functional Box-Sum simple box-sum: 4+3 = 7. August 4, 2019
ò Functional Box-Sum In general, object value can be a function. FBS= 20 = - 15 310. d ) 2 ( 7 11 x August 4, 2019
Functional Box-Sum A set of objects, each having a box and a value function. Given query box q, compute the total value of objects intersecting q, where contribution of an object is the integral of its value function over its intersection with q. August 4, 2019
Content Motivating examples Problem definition Straightforward approaches Our solutions Performance August 4, 2019
Straightforward Approach 1 No index. Scan through all objects; Obviously not efficient: query time is O(n). August 4, 2019
Straightforward Approach 2 Index the objects using R-tree [Guttman84]. Reduce to range search. Optimize by storing aggregate information at internal nodes [LM01]. Nevertheless, query time is still O(n). August 4, 2019
Challenge Can we compute the aggregate faster? Our approach: specialized index, query time reduces to log2(n). August 4, 2019
Content Motivating examples Problem definition Straightforward approaches Our solutions Performance August 4, 2019
Content Motivating examples Problem definition Straightforward approaches Our solutions Performance simple box-sum functional box-sum August 4, 2019
Our Solution for Simple Box-Sum We reduce a box-sum query to a set of dominance-sum queries; We propose the BA-tree to answer the dominance-sum query. August 4, 2019
Dominance-Sum A set of weighted point objects; Given query point p, compute total weight of objects dominated by p (i.e. to the lower left of p). August 4, 2019
Dominance-Sum A set of weighted point objects; Given query point p, compute total weight of objects dominated by p (i.e. to the lower left of p). dominance-sum = 18 August 4, 2019
Existing Reduction [EO82] We proved that the reduction technique of [EO82] reduces a d-dimensional box-sum query into dominance-sum queries. ) d 3 Ω( i 2 1 = ÷ ø ö ç è æ å August 4, 2019
Our New Reduction We reduce a d-dimensional simple box-sum query to 2d dominance-sum queries. Comparison: [EO82] Our Reduction d=2 8 4 d=3 26 August 4, 2019
Our New Reduction Key observation: in order for an object o to intersect query box q, the lower-left corner of o must be dominated by the upper-right corner of q. August 4, 2019
August 4, 2019
- - + August 4, 2019
BA-tree (for dominance-sum) 1-dimensional: augmented B+-tree; Along with each child pointer in an index node, store the total weight of points in the sub-tree; Query, update: O(log(n)). August 4, 2019
BA-tree (higher dimensions) augmented k-d-B-tree; k-d-B-tree [Robinson81]: indexes point objects; each index record corresponds to a rectangular region; region of parent is fully partitioned by regions of children. August 4, 2019
k-d-B-tree A August 4, 2019
k-d-B-tree A August 4, 2019
k-d-B-tree A B August 4, 2019
k-d-B-tree R A B August 4, 2019
k-d-B-tree R A B C August 4, 2019
k-d-B-tree R B A C August 4, 2019
k-d-B-tree R B A C August 4, 2019
k-d-B-tree R p Compute dominance-sum regarding point p by examining all children that intersect the rectangle [origin, p]. In this example: A, C, D, E, F, H. August 4, 2019
BA-tree R Motivation for augmentation: examine a single child! p Motivation for augmentation: examine a single child! the rectangle [origin, p] can be divided into four parts... August 4, 2019
BA-tree R p dominated by F’s lower-left corner August 4, 2019
BA-tree R p to the left of F August 4, 2019
BA-tree R p below F August 4, 2019
BA-tree R p intersection with F August 4, 2019
BA-tree R p Compute the total weight of points in these four regions separately and add them up! August 4, 2019
BA-tree R p Total weight of objects in this region: a single value (independent to where p is); augment F with this value (called subtotal). August 4, 2019
BA-tree R p Total weight of objects in this region: computed via a 1-dimensional BA-tree (called y-border) for the y values of all objects to the left of F. August 4, 2019
BA-tree R p Total weight of objects in this region: computed via a 1-dimensional BA-tree (called x-border) for the x values of all objects below F. August 4, 2019
BA-tree R For this part, examine the sub-tree rooted by F. Only one child! thus a single path from root to leaf. August 4, 2019
BA-tree R p Insertion: besides the k-d-B-tree insertion (into sub-tree of C), update subtotal (of F, G), x-border (of B) and y-border (of H). August 4, 2019
BA-tree R p Insertion: besides the k-d-B-tree insertion (into sub-tree of C), update subtotal (of F, G), x-border (of B) and y-border (of H). August 4, 2019
Summary for Our Simple Box-Sum Solution We proposed the BA-tree, a dominance-sum index: a k-d-B-tree augmented with subtotal, x-border and y-border. Due to our reduction, by maintaining several BA-trees together, we can compute the simple box-sum in poly-logarithmic time (assuming a balanced tree). August 4, 2019
Content Motivating examples Problem definition Straightforward approaches Our solutions Performance simple box-sum functional box-sum August 4, 2019
Our Solution for Functional Box-Sum First, focus on a special case: OIFBS (Origin-Involved Functional Box-Sum), where the query box contains the origin of space. August 4, 2019
Functional Box-Sum August 4, 2019
Origin-Involved special case OIFBS=4*(18*5)+3*(2*11)=426. August 4, 2019
Our Solution for Functional Box-Sum We reduce a OIFBS query to a dominance-sum query (solvable using BA-tree). We show how the Functional Box-Sum query can be computed via a set of OIFBS queries. August 4, 2019
OIFBS dominance-sum Idea: to insert a rectangular object, insert its four corners, associating a function with each corner. The functions should satisfy: for any point p in space, the contribution of each object to the OIFBS is equal to the sum of dominated corners. To compute an OIFBS regarding box [origin, p], compute the dominance-sum regarding p. August 4, 2019
We insert the lower-left corner along with the above function. For any point (x, y) in the object region, the contribution of the object to an OIFBS query at this point is: 4(x-2)(y-10). We insert the lower-left corner along with the above function. August 4, 2019
In general, at the lower-left corner (x1, y1) of object with value f(x, y), we should insert: ò x y f 1 ' d ) , ( The functions at the other corners? August 4, 2019
If (x, y) is to the right of the object, the contribution of the object to OIFBS is: ò x y f 2 1 ' d ) , ( It should equal to the sum of two corners. So the function at lower-right corner is g2-g1. August 4, 2019
ò ò ò ò We have proved that: at lower-left: insert v1=g1 at lower-right: insert v2=g2-g1 at upper-left: insert v3=g3-g1 at upper-right: insert v4= g1+g4-g2-g3 g1= ò x y f 1 ' d ) , ( g2= ò x y f 2 1 ' d ) , ( g3= ò x y f 1 2 ' d ) , ( g4= ò x y f 2 1 ' d ) , ( August 4, 2019
OIFBS dominance-sum If f(x, y) is a polynomial of degree k, then vi(x, y) are polynomials of degree k+2. e.g. in a previous example, f(x, y)=4, while v1(x, y)=4(x-2)(y-10). Such functions can be represent in constant space and can be combined or evaluated efficiently. August 4, 2019
Functional Box-Sum OIFBS A functional box-sum query can be transformed into four OIFBS queries. August 4, 2019
= - - + August 4, 2019
Summary for Our Functional Box-Sum Solution We reduced one functional box-sum query to four OIFBS queries; We reduced the OIFBS problem to the dominance-sum problem; Thus we can use the BA-tree for the functional box-sum computation; August 4, 2019
Content Motivating examples Problem definition Straightforward approaches Our solutions Performance simple box-sum functional box-sum August 4, 2019
Experimental Setup Sun Enterprise 250 Server, 8KB page size, 10MB buffer; 6 million random objects, size of each edge: roughly 1/10,000 of space; Our disk-based, dynamic BA-tree can be easily implemented. Implementation can be found at: http://www.cs.ucr.edu/~donghui/boxaggr/ August 4, 2019
Implemented Algorithms The BA-tree has over 200 times faster query performance than the plain R*-tree approach! We report the comparison against the improved aR-tree and omit the plain R*-tree; Two extensions of the dominance-sum data structure ECDF-tree [Bentley80] to disk-based, dynamic update environment. August 4, 2019
Comparing Index Sizes August 4, 2019
Simple Box-Sum Query Cost August 4, 2019
Functional Box-Sum Query Cost August 4, 2019
Conclusions We solved two variations of the box-sum problem; We reduced each variation to dominance-sums and proposed the BA-tree; With about 4 times overhead in space, we achieved 200x query improvement over the R*-tree approach and 30x query improvement over the aR-tree approach. August 4, 2019
Thank you! August 4, 2019