Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1

Name: Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
Uploaded: 2017-11-29T11:43:35+00:00
Duration: PTM9S7
Description: Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1 Department of Computer Science Duke University TU Eindhoven Institute of Information and Computing Sciences Utrecht University

Problem Definition Input: N rectangles in the plane Window query Q
Output: All rectangles intersecting Q Applications Spatial databases GIS CAD Computer vision Robotics …

R-Tree Definition [Guttman84]: Fanout: Ө(B) B: disk block size
Advantages: Little redundancy Multi-purpose Easy to update Fanout: Ө(B) B: disk block size G F E B A H I A B C D E F G H I C D

How to Build an R-Tree Repeated insertions [Guttman84]
R+-tree [Sellis et al. 87] R*-tree [Beckmann et al. 90] Bulkloading Hilbert R-Tree [Kamel and Faloutos 94] Top-down Greedy Split [Garcia et al. 98] Advantages: Much faster than repeated insertions Better space utilization Usually produce R-trees with higher quality

R-Tree Variant: Hilbert R-Tree
Hilbert Curve To build a Hilbert R-Tree (cost: O(N/B logM/BN) I/Os) Sort the rectangles by the Hilbert values of their centers Build a B-tree on top 4D Hilbert R-tree

R-Tree Variant: TGS R-Tree
(Top-down Greedy Split) To build a TGS R-tree Start from the root and build the tree top-down To build one node, use binary cuts until the desired fan-out is reached To make a binary cut, consider 4 orderings of the rectangles: xmin, ymin, xmax, ymax In each ordering, consider the B cutting positions Choose the one that minimizes the sum of the areas of the two resulted bounding boxes Typical bulk-load cost: O(N/B log2N) I/Os

Our Results None of existing R-tree variants has worst-case query performance guarantee! In the worst-case, a query can visit all nodes in the tree even when the output size is zero Priority R-Tree The first R-tree variant that answers a query by visiting nodes in the worst case T: Output size It is optimal! There exists a dataset such that for any R-tree, there is an empty query that visits nodes. [Kanth and Singh 99, Agarwal et al. 02]

Roadmap Pseudo-PR-Tree Has the desired worst-case guarantee
Not a real R-tree Transform a pseudo-PR-Tree into a PR-tree A real R-tree Maintain the worst-case guarantee Experiments PR-tree Hilbert R-tree (2D and 4D) TGS-R-tree

Building a Pseudo-PR-Tree
priority leaves root Step 1: take out B extreme rectangles from each direction and put them into priority leaves

Building a Pseudo-PR-Tree
Step 2: Divide by the xmin coordinates and build subtrees recursively. Division is performed using xmin, ymin, xmax, ymax in a round-robin fashion, like a 4D kd-tree root Analysis sketch: # nodes with at least one priority leaf completely reported: O(T/B) # nodes with no priority leaf completely reported:

Pseudo-PR-Tree to a Real R-tree

Query Complexity Remains Unchanged
Next level: # nodes visited on leaf level

PR-Tree: Bulkload & Updates
O(N/B∙log2N) I/Os→O(N/B∙logM/BN) I/Os, using “grid method” [Agarwal et al. 01] The same as Hilbert R-tree, but with a larger constant Updates Can use any previous heuristic to update in O(logBN) I/Os Without worst-case query guarantee Use logarithmic method Insert: O(logBN + 1/B · logM/BN log2(N/M)) I/Os Delete: O(logBN) I/Os Extending to d-dimensions Query bound: O((N/B)1-1/d + T/B), still optimal Bulkload & update bounds remain the same

Experiments Implemented with TPIE Priority R-tree Hilbert R-tree
4D Hilbert R-tree TGS R-tree Real-life data TIGER datasets 16 million rectangles Synthetic data Varying from normal to extreme data 10 million rectangles

Experiments with Real-Life Data
Query performance on the TIGER datasets Shown: # I/Os spent in answering a query T/B

Experiments with Synthetic Data: SIZE
Each side of a rectangle is uniformly distributed in [0, max_side] Queries are squares with area 1%

Experiments with Synthetic Data: ASPECT
Fix the area, vary aspect ratio

Experiments with Synthetic Data: SKEWED
Randomly place points, then do y’=yc on the y-coordinates

Experiments with Synthetic Data: CLUSTER

Conclusions In theory The PR-tree is the first R-tree variant that answers a window query in I/Os worst-case, which is optimal In practice Roughly the same as previous best R-trees on real-life and relatively nicely distributed data Outperforms them significantly on more extreme data Future work How previous heuristics may affect the performance of the PR-tree in the dynamic case

Lower Bound Construction
Each bounding box intersects at least queries N/B bounding boxes queries There exists a query that intersects at least bounding boxes

Pseudo-PR-Tree: Query Complexity
Nodes v visited where all rectangles in at least one of the priority leaves of v’s parent are reported: O(T/B) Let v be a node visited but none of the priority leaves at its parent are reported completely, consider v’s parent u 2D 4D Q ymin = ymax(Q) xmax = xmin(Q)

Pseudo-PR-Tree: Query Complexity
The cell in the 4D kd-tree of u is intersected by two different 3-dimensional hyper-planes The intersection of each pair of such 3-dimensional hyper-planes is a 2-dimensional hyper-plane Lemma: # of cells in a d-dimensional kd-tree that intersect an axis-parallel f-dimensional hyper-plane is O((N/B)f/d) So, # such cells in a 4D kd-tree: Total # nodes visited: u

Experiments with Real-Life Data
Datasets: TIGER/Line data Bulk-loading:

Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1

Similar presentations

Presentation on theme: "Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1

Similar presentations

Presentation on theme: "Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1"— Presentation transcript:

Similar presentations

About project

Feedback