R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Indexing and Range Queries in Spatio-Temporal Databases
A Simple and Efficient Algorithm for R-Tree Packing Scott T. Leutenegger, Mario A. Lopez, Jeffrey Edgington STR Sunho Cho Jeonghun Ahn 1.
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Indexing for NN retrieval
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Accessing Spatial Data
Spatio-Temporal Databases
Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University.
Spatial Indexing SAMs.
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
Spatial Queries Nearest Neighbor Queries.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Advanced Data Structures NTUA 2007 R-trees and Grid File.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing (part 2)
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
AAU A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing Presented by YuQing Zhang  Slobodan Rasetic Jorg Sander James Elding Mario A.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice.
Advanced Data Structures NTUA 2007 R-trees and Grid File.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Performance Comparison of xBR-trees and R*-trees for Single Dataset Spatial Queries Performance Comparison of xBR-trees and R*-trees for Single Dataset.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
Time Series Sequence Matching Jiaqin Wang CMPS 565.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
STR: A Simple and Efficient Algorithm for R-Tree Packing.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Strategies for Spatial Joins
Spatial Indexing.
Spatio-temporal Pattern Queries
Area and Perimeter (P3) Definition:
Spatio-Temporal Databases
15-826: Multimedia Databases and Data Mining
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Lecture 21: B-Trees Monday, Nov. 19, 2001.
Spatial Indexing I R-trees
R-trees: An Average Case Analysis
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

R-tree Analysis

R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?

R-trees - performance analysis A: because we can design split etc algorithms accordingly; also, do query-optimization motivating question: on, e.g., split, should we try to minimize the area (volume)? the perimeter? the overlap? or a weighted combination? why?

R-trees - performance analysis How many disk accesses (expected value) for range queries? query distribution wrt location? “ “ wrt size?

R-trees - performance analysis How many disk accesses for range queries? query distribution wrt location? uniform; (biased) “ “ wrt size? uniform

R-trees - performance analysis easier case: we know the positions of data nodes and their MBRs, eg:

R-trees - performance analysis How many times will P1 be retrieved (unif. queries)? P1 x1 x2

R-trees - performance analysis How many times will P1 be retrieved (unif. POINT queries)? P1 x1 x

R-trees - performance analysis How many times will P1 be retrieved (unif. POINT queries)? A: x1*x2 P1 x1 x

R-trees - performance analysis How many times will P1 be retrieved (unif. queries of size q1xq2)? P1 x1 x q1 q2

R-trees - performance analysis Minkowski sum q1 q2 q1/2 q2/2

R-trees - performance analysis How many times will P1 be retrieved (unif. queries of size q1xq2)? A: (x1+q1)*(x2+q2) P1 x1 x q1 q2

R-trees - performance analysis Thus, given a tree with n nodes (i=1,... n) we expect

R-trees - performance analysis Thus, given a tree with n nodes (i=1,... n) we expect ‘volume’ ‘surface area’ count

R-trees - performance analysis Observations: for point queries: only volume matters for horizontal-line queries: (q2=0): vertical length matters for large queries (q1, q2 >> 0): the count N matters overlap: does not seem to matter (but it is related to area) formula: easily extendible to n dimensions

R-trees - performance analysis Conclusions: splits should try to minimize area and perimeter ie., we want few, small, square-like parent MBRs rule of thumb: shoot for queries with q1=q2 = 0.1 (or =0.05 or so).

More general Model What if we have only the dataset D and the set of queries S ? We should “predict” the structures of a “good” R-tree for this dataset. Then use the previous model to estimate the average query performance for S

Unifrom dataset Assume that the dataset (that contains only rectangles) is uniformly distributed in space. Density of a set of N MBRs is the average number of MBRs that contain a given point in space. OR the total area covered by the MBRs over the area of the work space. N boxes with average size s= (s 1,s 2 ), D(N,s) = N s 1 s 2 If s 1 =s 2 =s, then:

Density of Leaf nodes Assume a dataset of N rectangles. If the average page capacity is f, then we have N ln = N/f leaf nodes. If D 1 is the density of the leaf MBRs, and the average area of each leaf MBR is s 2, then: So, we can estimate s, from N, f, D 1 We need to estimate D 1 from the dataset’s density…

Estimating D 1 Consider a leaf node that contains f MBRs. Then for each side of the leaf node MBR we have: MBRs Also, N ln leaf nodes contain N MBRs, uniformly distributed. The average distance between the centers of two consecutive MBRs is t= (assuming [0,1] 2 space) t

Estimating D 1 Combining the previous observations we can estimate the density at the leaf level, from the density of the dataset: We can apply the same ideas recursively to the other levels of the tree.

R-trees–performance analysis Assuming Uniform distribution: where And D is the density of the dataset, f the fanout [TS96], N the number of objects

Project Deadlines Phase 1 : Proposal Oct 13, 2004 Phase 2 : Progress Report Nov 12, 2004 Phase 3: Final Report Dec 10, 2004