Download presentation
Presentation is loading. Please wait.
Published byἸάειρος Αντωνόπουλος Modified over 5 years ago
1
Efficient Aggregation over Objects with Extent
Donghui Zhang Vassilis J. Tsotras Dimitrios Gunopulos Computer Science Department University of California, Riverside PODS’02 August 4, 2019
2
Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance August 4, 2019
3
Why Aggregation? Aggregation: compute the total value over a subset of records which satisfy some selection condition (e.g. located in an interesting region). An important operator for data mining, on-line query processing, data warehousing, etc. Data volume is large. With aggregation, user can get a good summary quickly. August 4, 2019
4
Why Objects with Extent?
Many applications (agricultural, meteorological, geo-spatial, etc.) produce data that have spatial (plus temporal) extent. For example, a rainfall record corresponds to a region, not a single point. August 4, 2019
5
August 4, 2019
6
Motivating Example A set of rain precipitation records, each having a region and a precipitation value. Given an arbitrary region, what is the total rainfall in this region this month? August 4, 2019
7
Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance August 4, 2019
8
Problem Definition For simplicity, we focus on rectangular objects and query regions (a complex region can be decomposed to boxes). Two problem variations: simple box-sum and functional box-sum. August 4, 2019
9
Simple Box-Sum A set of weighted rectangular objects;
Given query box q, compute total weight of objects intersecting q. box-sum=3 August 4, 2019
10
Functional Box-Sum simple box-sum: 4+3 = 7.
August 4, 2019
11
ò Functional Box-Sum In general, object value can be a function. FBS=
20 = - 15 310. d ) 2 ( 7 11 x August 4, 2019
12
Functional Box-Sum A set of objects, each having a box and a value function. Given query box q, compute the total value of objects intersecting q, where contribution of an object is the integral of its value function over its intersection with q. August 4, 2019
13
Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance August 4, 2019
14
Straightforward Approach 1
No index. Scan through all objects; Obviously not efficient: query time is O(n). August 4, 2019
15
Straightforward Approach 2
Index the objects using R-tree [Guttman84]. Reduce to range search. Optimize by storing aggregate information at internal nodes [LM01]. Nevertheless, query time is still O(n). August 4, 2019
16
Challenge Can we compute the aggregate faster?
Our approach: specialized index, query time reduces to log2(n). August 4, 2019
17
Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance August 4, 2019
18
Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance simple box-sum functional box-sum August 4, 2019
19
Our Solution for Simple Box-Sum
We reduce a box-sum query to a set of dominance-sum queries; We propose the BA-tree to answer the dominance-sum query. August 4, 2019
20
Dominance-Sum A set of weighted point objects;
Given query point p, compute total weight of objects dominated by p (i.e. to the lower left of p). August 4, 2019
21
Dominance-Sum A set of weighted point objects;
Given query point p, compute total weight of objects dominated by p (i.e. to the lower left of p). dominance-sum = 18 August 4, 2019
22
Existing Reduction [EO82]
We proved that the reduction technique of [EO82] reduces a d-dimensional box-sum query into dominance-sum queries. ) d 3 Ω( i 2 1 = ÷ ø ö ç è æ å August 4, 2019
23
Our New Reduction We reduce a d-dimensional simple box-sum query to 2d dominance-sum queries. Comparison: [EO82] Our Reduction d=2 8 4 d=3 26 August 4, 2019
24
Our New Reduction Key observation: in order for an object o to intersect query box q, the lower-left corner of o must be dominated by the upper-right corner of q. August 4, 2019
25
August 4, 2019
26
- - + August 4, 2019
27
BA-tree (for dominance-sum)
1-dimensional: augmented B+-tree; Along with each child pointer in an index node, store the total weight of points in the sub-tree; Query, update: O(log(n)). August 4, 2019
28
BA-tree (higher dimensions)
augmented k-d-B-tree; k-d-B-tree [Robinson81]: indexes point objects; each index record corresponds to a rectangular region; region of parent is fully partitioned by regions of children. August 4, 2019
29
k-d-B-tree A August 4, 2019
30
k-d-B-tree A August 4, 2019
31
k-d-B-tree A B August 4, 2019
32
k-d-B-tree R A B August 4, 2019
33
k-d-B-tree R A B C August 4, 2019
34
k-d-B-tree R B A C August 4, 2019
35
k-d-B-tree R B A C August 4, 2019
36
k-d-B-tree R p Compute dominance-sum regarding point p by examining all children that intersect the rectangle [origin, p]. In this example: A, C, D, E, F, H. August 4, 2019
37
BA-tree R Motivation for augmentation: examine a single child!
p Motivation for augmentation: examine a single child! the rectangle [origin, p] can be divided into four parts... August 4, 2019
38
BA-tree R p dominated by F’s lower-left corner August 4, 2019
39
BA-tree R p to the left of F August 4, 2019
40
BA-tree R p below F August 4, 2019
41
BA-tree R p intersection with F August 4, 2019
42
BA-tree R p Compute the total weight of points in these four regions separately and add them up! August 4, 2019
43
BA-tree R p Total weight of objects in this region: a single value (independent to where p is); augment F with this value (called subtotal). August 4, 2019
44
BA-tree R p Total weight of objects in this region: computed via a 1-dimensional BA-tree (called y-border) for the y values of all objects to the left of F. August 4, 2019
45
BA-tree R p Total weight of objects in this region: computed via a 1-dimensional BA-tree (called x-border) for the x values of all objects below F. August 4, 2019
46
BA-tree R For this part, examine the sub-tree rooted by F.
Only one child! thus a single path from root to leaf. August 4, 2019
47
BA-tree R p Insertion: besides the k-d-B-tree insertion (into sub-tree of C), update subtotal (of F, G), x-border (of B) and y-border (of H). August 4, 2019
48
BA-tree R p Insertion: besides the k-d-B-tree insertion (into sub-tree of C), update subtotal (of F, G), x-border (of B) and y-border (of H). August 4, 2019
49
Summary for Our Simple Box-Sum Solution
We proposed the BA-tree, a dominance-sum index: a k-d-B-tree augmented with subtotal, x-border and y-border. Due to our reduction, by maintaining several BA-trees together, we can compute the simple box-sum in poly-logarithmic time (assuming a balanced tree). August 4, 2019
50
Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance simple box-sum functional box-sum August 4, 2019
51
Our Solution for Functional Box-Sum
First, focus on a special case: OIFBS (Origin-Involved Functional Box-Sum), where the query box contains the origin of space. August 4, 2019
52
Functional Box-Sum August 4, 2019
53
Origin-Involved special case
OIFBS=4*(18*5)+3*(2*11)=426. August 4, 2019
54
Our Solution for Functional Box-Sum
We reduce a OIFBS query to a dominance-sum query (solvable using BA-tree). We show how the Functional Box-Sum query can be computed via a set of OIFBS queries. August 4, 2019
55
OIFBS dominance-sum Idea: to insert a rectangular object, insert its four corners, associating a function with each corner. The functions should satisfy: for any point p in space, the contribution of each object to the OIFBS is equal to the sum of dominated corners. To compute an OIFBS regarding box [origin, p], compute the dominance-sum regarding p. August 4, 2019
56
We insert the lower-left corner along with the above function.
For any point (x, y) in the object region, the contribution of the object to an OIFBS query at this point is: 4(x-2)(y-10). We insert the lower-left corner along with the above function. August 4, 2019
57
In general, at the lower-left corner (x1, y1) of object with value f(x, y), we should insert:
ò x y f 1 ' d ) , ( The functions at the other corners? August 4, 2019
58
If (x, y) is to the right of the object, the contribution of the object to OIFBS is:
ò x y f 2 1 ' d ) , ( It should equal to the sum of two corners. So the function at lower-right corner is g2-g1. August 4, 2019
59
ò ò ò ò We have proved that: at lower-left: insert v1=g1
at lower-right: insert v2=g2-g1 at upper-left: insert v3=g3-g1 at upper-right: insert v4= g1+g4-g2-g3 g1= ò x y f 1 ' d ) , ( g2= ò x y f 2 1 ' d ) , ( g3= ò x y f 1 2 ' d ) , ( g4= ò x y f 2 1 ' d ) , ( August 4, 2019
60
OIFBS dominance-sum If f(x, y) is a polynomial of degree k, then vi(x, y) are polynomials of degree k+2. e.g. in a previous example, f(x, y)=4, while v1(x, y)=4(x-2)(y-10). Such functions can be represent in constant space and can be combined or evaluated efficiently. August 4, 2019
61
Functional Box-Sum OIFBS
A functional box-sum query can be transformed into four OIFBS queries. August 4, 2019
62
= - - + August 4, 2019
63
Summary for Our Functional Box-Sum Solution
We reduced one functional box-sum query to four OIFBS queries; We reduced the OIFBS problem to the dominance-sum problem; Thus we can use the BA-tree for the functional box-sum computation; August 4, 2019
64
Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance simple box-sum functional box-sum August 4, 2019
65
Experimental Setup Sun Enterprise 250 Server, 8KB page size, 10MB buffer; 6 million random objects, size of each edge: roughly 1/10,000 of space; Our disk-based, dynamic BA-tree can be easily implemented. Implementation can be found at: August 4, 2019
66
Implemented Algorithms
The BA-tree has over 200 times faster query performance than the plain R*-tree approach! We report the comparison against the improved aR-tree and omit the plain R*-tree; Two extensions of the dominance-sum data structure ECDF-tree [Bentley80] to disk-based, dynamic update environment. August 4, 2019
67
Comparing Index Sizes August 4, 2019
68
Simple Box-Sum Query Cost
August 4, 2019
69
Functional Box-Sum Query Cost
August 4, 2019
70
Conclusions We solved two variations of the box-sum problem;
We reduced each variation to dominance-sums and proposed the BA-tree; With about 4 times overhead in space, we achieved 200x query improvement over the R*-tree approach and 30x query improvement over the aR-tree approach. August 4, 2019
71
Thank you! August 4, 2019
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.