Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Aggregation over Objects with Extent

Similar presentations


Presentation on theme: "Efficient Aggregation over Objects with Extent"— Presentation transcript:

1 Efficient Aggregation over Objects with Extent
Donghui Zhang Vassilis J. Tsotras Dimitrios Gunopulos Computer Science Department University of California, Riverside PODS’02 August 4, 2019

2 Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance August 4, 2019

3 Why Aggregation? Aggregation: compute the total value over a subset of records which satisfy some selection condition (e.g. located in an interesting region). An important operator for data mining, on-line query processing, data warehousing, etc. Data volume is large. With aggregation, user can get a good summary quickly. August 4, 2019

4 Why Objects with Extent?
Many applications (agricultural, meteorological, geo-spatial, etc.) produce data that have spatial (plus temporal) extent. For example, a rainfall record corresponds to a region, not a single point. August 4, 2019

5 August 4, 2019

6 Motivating Example A set of rain precipitation records, each having a region and a precipitation value. Given an arbitrary region, what is the total rainfall in this region this month? August 4, 2019

7 Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance August 4, 2019

8 Problem Definition For simplicity, we focus on rectangular objects and query regions (a complex region can be decomposed to boxes). Two problem variations: simple box-sum and functional box-sum. August 4, 2019

9 Simple Box-Sum A set of weighted rectangular objects;
Given query box q, compute total weight of objects intersecting q. box-sum=3 August 4, 2019

10 Functional Box-Sum simple box-sum: 4+3 = 7.
August 4, 2019

11 ò Functional Box-Sum In general, object value can be a function. FBS=
20 = - 15 310. d ) 2 ( 7 11 x August 4, 2019

12 Functional Box-Sum A set of objects, each having a box and a value function. Given query box q, compute the total value of objects intersecting q, where contribution of an object is the integral of its value function over its intersection with q. August 4, 2019

13 Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance August 4, 2019

14 Straightforward Approach 1
No index. Scan through all objects; Obviously not efficient: query time is O(n). August 4, 2019

15 Straightforward Approach 2
Index the objects using R-tree [Guttman84]. Reduce to range search. Optimize by storing aggregate information at internal nodes [LM01]. Nevertheless, query time is still O(n). August 4, 2019

16 Challenge Can we compute the aggregate faster?
Our approach: specialized index, query time reduces to log2(n). August 4, 2019

17 Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance August 4, 2019

18 Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance simple box-sum functional box-sum August 4, 2019

19 Our Solution for Simple Box-Sum
We reduce a box-sum query to a set of dominance-sum queries; We propose the BA-tree to answer the dominance-sum query. August 4, 2019

20 Dominance-Sum A set of weighted point objects;
Given query point p, compute total weight of objects dominated by p (i.e. to the lower left of p). August 4, 2019

21 Dominance-Sum A set of weighted point objects;
Given query point p, compute total weight of objects dominated by p (i.e. to the lower left of p). dominance-sum = 18 August 4, 2019

22 Existing Reduction [EO82]
We proved that the reduction technique of [EO82] reduces a d-dimensional box-sum query into dominance-sum queries. ) d 3 Ω( i 2 1 = ÷ ø ö ç è æ å August 4, 2019

23 Our New Reduction We reduce a d-dimensional simple box-sum query to 2d dominance-sum queries. Comparison: [EO82] Our Reduction d=2 8 4 d=3 26 August 4, 2019

24 Our New Reduction Key observation: in order for an object o to intersect query box q, the lower-left corner of o must be dominated by the upper-right corner of q. August 4, 2019

25 August 4, 2019

26 - - + August 4, 2019

27 BA-tree (for dominance-sum)
1-dimensional: augmented B+-tree; Along with each child pointer in an index node, store the total weight of points in the sub-tree; Query, update: O(log(n)). August 4, 2019

28 BA-tree (higher dimensions)
augmented k-d-B-tree; k-d-B-tree [Robinson81]: indexes point objects; each index record corresponds to a rectangular region; region of parent is fully partitioned by regions of children. August 4, 2019

29 k-d-B-tree A August 4, 2019

30 k-d-B-tree A August 4, 2019

31 k-d-B-tree A B August 4, 2019

32 k-d-B-tree R A B August 4, 2019

33 k-d-B-tree R A B C August 4, 2019

34 k-d-B-tree R B A C August 4, 2019

35 k-d-B-tree R B A C August 4, 2019

36 k-d-B-tree R p Compute dominance-sum regarding point p by examining all children that intersect the rectangle [origin, p]. In this example: A, C, D, E, F, H. August 4, 2019

37 BA-tree R Motivation for augmentation: examine a single child!
p Motivation for augmentation: examine a single child! the rectangle [origin, p] can be divided into four parts... August 4, 2019

38 BA-tree R p dominated by F’s lower-left corner August 4, 2019

39 BA-tree R p to the left of F August 4, 2019

40 BA-tree R p below F August 4, 2019

41 BA-tree R p intersection with F August 4, 2019

42 BA-tree R p Compute the total weight of points in these four regions separately and add them up! August 4, 2019

43 BA-tree R p Total weight of objects in this region: a single value (independent to where p is); augment F with this value (called subtotal). August 4, 2019

44 BA-tree R p Total weight of objects in this region: computed via a 1-dimensional BA-tree (called y-border) for the y values of all objects to the left of F. August 4, 2019

45 BA-tree R p Total weight of objects in this region: computed via a 1-dimensional BA-tree (called x-border) for the x values of all objects below F. August 4, 2019

46 BA-tree R For this part, examine the sub-tree rooted by F.
Only one child! thus a single path from root to leaf. August 4, 2019

47 BA-tree R p Insertion: besides the k-d-B-tree insertion (into sub-tree of C), update subtotal (of F, G), x-border (of B) and y-border (of H). August 4, 2019

48 BA-tree R p Insertion: besides the k-d-B-tree insertion (into sub-tree of C), update subtotal (of F, G), x-border (of B) and y-border (of H). August 4, 2019

49 Summary for Our Simple Box-Sum Solution
We proposed the BA-tree, a dominance-sum index: a k-d-B-tree augmented with subtotal, x-border and y-border. Due to our reduction, by maintaining several BA-trees together, we can compute the simple box-sum in poly-logarithmic time (assuming a balanced tree). August 4, 2019

50 Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance simple box-sum functional box-sum August 4, 2019

51 Our Solution for Functional Box-Sum
First, focus on a special case: OIFBS (Origin-Involved Functional Box-Sum), where the query box contains the origin of space. August 4, 2019

52 Functional Box-Sum August 4, 2019

53 Origin-Involved special case
OIFBS=4*(18*5)+3*(2*11)=426. August 4, 2019

54 Our Solution for Functional Box-Sum
We reduce a OIFBS query to a dominance-sum query (solvable using BA-tree). We show how the Functional Box-Sum query can be computed via a set of OIFBS queries. August 4, 2019

55 OIFBS  dominance-sum Idea: to insert a rectangular object, insert its four corners, associating a function with each corner. The functions should satisfy: for any point p in space, the contribution of each object to the OIFBS is equal to the sum of dominated corners. To compute an OIFBS regarding box [origin, p], compute the dominance-sum regarding p. August 4, 2019

56 We insert the lower-left corner along with the above function.
For any point (x, y) in the object region, the contribution of the object to an OIFBS query at this point is: 4(x-2)(y-10). We insert the lower-left corner along with the above function. August 4, 2019

57 In general, at the lower-left corner (x1, y1) of object with value f(x, y), we should insert:
ò x y f 1 ' d ) , ( The functions at the other corners? August 4, 2019

58 If (x, y) is to the right of the object, the contribution of the object to OIFBS is:
ò x y f 2 1 ' d ) , ( It should equal to the sum of two corners. So the function at lower-right corner is g2-g1. August 4, 2019

59 ò ò ò ò We have proved that: at lower-left: insert v1=g1
at lower-right: insert v2=g2-g1 at upper-left: insert v3=g3-g1 at upper-right: insert v4= g1+g4-g2-g3 g1= ò x y f 1 ' d ) , ( g2= ò x y f 2 1 ' d ) , ( g3= ò x y f 1 2 ' d ) , ( g4= ò x y f 2 1 ' d ) , ( August 4, 2019

60 OIFBS  dominance-sum If f(x, y) is a polynomial of degree k, then vi(x, y) are polynomials of degree k+2. e.g. in a previous example, f(x, y)=4, while v1(x, y)=4(x-2)(y-10). Such functions can be represent in constant space and can be combined or evaluated efficiently. August 4, 2019

61 Functional Box-Sum  OIFBS
A functional box-sum query can be transformed into four OIFBS queries. August 4, 2019

62 = - - + August 4, 2019

63 Summary for Our Functional Box-Sum Solution
We reduced one functional box-sum query to four OIFBS queries; We reduced the OIFBS problem to the dominance-sum problem; Thus we can use the BA-tree for the functional box-sum computation; August 4, 2019

64 Content Motivating examples Problem definition
Straightforward approaches Our solutions Performance simple box-sum functional box-sum August 4, 2019

65 Experimental Setup Sun Enterprise 250 Server, 8KB page size, 10MB buffer; 6 million random objects, size of each edge: roughly 1/10,000 of space; Our disk-based, dynamic BA-tree can be easily implemented. Implementation can be found at: August 4, 2019

66 Implemented Algorithms
The BA-tree has over 200 times faster query performance than the plain R*-tree approach! We report the comparison against the improved aR-tree and omit the plain R*-tree; Two extensions of the dominance-sum data structure ECDF-tree [Bentley80] to disk-based, dynamic update environment. August 4, 2019

67 Comparing Index Sizes August 4, 2019

68 Simple Box-Sum Query Cost
August 4, 2019

69 Functional Box-Sum Query Cost
August 4, 2019

70 Conclusions We solved two variations of the box-sum problem;
We reduced each variation to dominance-sums and proposed the BA-tree; With about 4 times overhead in space, we achieved 200x query improvement over the R*-tree approach and 30x query improvement over the aR-tree approach. August 4, 2019

71 Thank you! August 4, 2019


Download ppt "Efficient Aggregation over Objects with Extent"

Similar presentations


Ads by Google