Download presentation
Presentation is loading. Please wait.
Published byArron Martin Modified over 9 years ago
1
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01
2
Outline Problem Definition Straightforward Solutions Our Solution Performance Results By-Product: Optimized the MSB-tree Conclusions ACM GIS’01
3
Problem Definition Consider a collection of spatial objects. Each object: rectangle r, value v. Spatial Aggregation: find aggregate value over objects intersecting a given rectangle. We focus on MAX. E.g.: a database of rainfalls over geographical areas. Find max rainfall in Los Angeles area. Problem Definition ACM GIS’01
4
Straightforward Solutions Use an R*-tree [BKS+90] to index the objects. Reduce to range search. Straightforward Solutions Better approach: aR-tree [PKZ+01, LM01]. Store MAX of the sub-tree in internal nodes; If query rectangle contains a sub-tree, no need to search it. ACM GIS’01
5
Straightforward Solutions Use an R*-tree [BKS+90] to index the objects. Reduce to range search. Straightforward Solutions Better approach: aR-tree [PKZ+01, LM01]. Store MAX of the sub-tree in internal nodes; If query rectangle contains a sub-tree, no need to search it. ACM GIS’01
6
Our Solution -- overview The MR-tree: a specialized index for Min/Max aggregation. It uses the R*-tree and four optimization techniques: Our Solution k-max : increase the chance for the search algorithm to stop at higher tree levels; box-elimination : erase information from the tree that will not contribute to any query; union : do not insert an object which will not contribute to any query; area-reduction : reduce the area of the object to be inserted. ACM GIS’01
7
The k-max Optimization Motivation: The aR-tree is not efficient if the query rectangle intersects but does not fully contain a sub-tree rectangle. Optimization Techniques ACM GIS’01
8
The k-max Optimization Optimization Techniques Motivation: The aR-tree is not efficient if the query rectangle intersects but does not fully contain a sub-tree rectangle. ACM GIS’01
9
The k-max Optimization Along with each index record r, store the k max- value objects in sub-tree(r). Upon query, if the query rectangle intersects any of the k objects at r, omit sub-tree(r). Optimization Techniques Trade-off: larger k more sub-trees to be omitted during query; but also more space & update. ACM GIS’01
10
The box-elimination Optimization Motivation: if for objects o 1 and o 2, o 1.box contains o 2.box and o 1.value o 2.value, o 2 is obsolete, i.e. does not contribute to any query and thus can be deleted. Optimization Techniques ACM GIS’01
11
The box-elimination Optimization Similar for object o 1 and index record r 2, i.e. if o 1.box contains r 2.box and o 1.value max value in sub-tree(r 2 ), the whole sub-tree is obsolete. Optimization Techniques Trade-off: larger c smaller index size and faster query time; but also more update time. Ideally, remove all obsolete objects/sub-trees, but too expensive. Instead, pick c (c : constant) paths. The optimization: at insertion, remove obsolete objects and sub-trees along the insertion path. ACM GIS’01
12
The union Optimization Motivation 1: if a new object o 1 is obsolete due to an existing object o 2, o 1 should not be inserted. Optimization Techniques Motivation 2: a new object o 1 may be obsolete due to the union of several existing objects. ACM GIS’01
13
The union Optimization Motivation 1: if a new object o 1 is obsolete due to an existing object o 2, o 1 should not be inserted. Optimization Techniques Motivation 2: a new object o 1 may be obsolete due to the union of several existing objects. ACM GIS’01
14
The union Optimization Along with each index record r, store the union of boxes of all objects in sub-tree(r); also store the MIN value of all these objects. Do not perform the insertion of object o 1 if: Optimization Techniques Question: how is the union computed and stored? o 1.box is contained in r.union, and o 1.value r.min. ACM GIS’01
15
The union Optimization Store an approximate union representation using t (t : constant) boxes. The approximation should be fully contained in the actual union, and should cover as much space as possible. Optimization Techniques Def: given a set of n boxes S={s 1,…, s n }, the covered t-union of S is a set of t boxes A={a 1,…, a t } s.t. s i covers a i, and a i covers max area possible. ACM GIS’01
16
The union Optimization Optimization Techniques To compute the exact covered t-union: O(n 2t+4 ). We propose an much faster approximate algorithm: O(n logn). ACM GIS’01 Idea of our algorithm: pick up t largest boxes and expand them.
17
The area-reduction Optimization Motivation: the box of a new object o 1 can be reduced if an existing object o 2 intersects it with a larger or equal value. Optimization Techniques ACM GIS’01
18
The area-reduction Optimization Motivation: the box of a new object o 1 can be reduced if an existing object o 2 intersects it with a larger or equal value. Optimization Techniques ACM GIS’01
19
The area-reduction Optimization Reduce the area of new object o 1 when: Optimization Techniques index record r s.t. r.union intersects o.box and r.min o.value, or one of the k max-value objects intersects o 1 with a larger or equal value, or leaf object o 2 s.t. o 2.box intersects o 1.box and o 2.value o 1.value. ACM GIS’01
20
The area-reduction Optimization Benefit 1: reduce overlap among sibling nodes. Optimization Techniques ACM GIS’01
21
The area-reduction Optimization Benefit 1: reduce overlap among sibling nodes. Optimization Techniques Benefit 2: increase chance to make new objects obsolete. ACM GIS’01
22
Performance Results Datasets: 5 million square objects, size randomly chosen from 10 to 10000 (space in each dimension is 1 to one million). Implemented algorithms: Performance Results R*: the R*-tree [BKS+90]; aR: the aR-tree [PKZ+01, LM01]; kaR: the aR-tree with k-max optimization; MR: the MR-tree (with all the optimizations). ACM GIS’01
23
Index Sizes Performance Results ACM GIS’01
24
Performance Results Query Performance (log scale) Query time is the total of 100 random queries of the same query rectangle size. ACM GIS’01
25
Optimizing the MSB-tree The MSB-tree [YW00]: efficiently maintains and computes MIN/MAX aggregates over 1-dim interval data. Insertion/Query: O(log B m), B is page capacity, m is number of leaf records. [YW00]: periodically reconstruct the whole tree to maintain a small m. During reconstruction, the index is off-line. Can avoid reconstruction by applying the box-elimination optimization. Idea: if a new interval contains all intervals in a sub-tree with a larger value, the sub-tree is obsolete. Optimizing the MSB-tree ACM GIS’01
26
Conclusions Addressed the MIN/MAX aggregation problem over spatial objects; Four optimization techniques; The MR-tree; Much smaller index size and query time; By-product: optimized the MSB-tree. Conclusions ACM GIS’01
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.