R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Indexing and Range Queries in Spatio-Temporal Databases
A Simple and Efficient Algorithm for R-Tree Packing Scott T. Leutenegger, Mario A. Lopez, Jeffrey Edgington STR Sunho Cho Jeonghun Ahn 1.
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Indexing for NN retrieval
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatio-Temporal Databases
Spatial Indexing SAMs.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
Chapter 3: Data Storage and Access Methods
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
Spatial Queries Nearest Neighbor Queries.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Advanced Data Structures NTUA 2007 R-trees and Grid File.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
AAU A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing Presented by YuQing Zhang  Slobodan Rasetic Jorg Sander James Elding Mario A.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Handling Spatial Data In P2P Systems Verena Kantere, Timos Sellis, Yannis Kouvaras.
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
Advanced Data Structures NTUA 2007 R-trees and Grid File.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Optimizing Multidimensional Index Trees for Main Memory Access Author: Kihong Kim, Sang K. Cha, Keunjoo Kwon Members: Iris Zhang, Grace Yung, Kara Kwon,
Performance Comparison of xBR-trees and R*-trees for Single Dataset Spatial Queries Performance Comparison of xBR-trees and R*-trees for Single Dataset.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Time Series Sequence Matching Jiaqin Wang CMPS 565.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
The Power-Method: A Comprehensive Estimation Technique for Multi- Dimensional Queries Yufei Tao U. Hong Kong Christos Faloutsos CMU Dimitris Papadias Hong.
STR: A Simple and Efficient Algorithm for R-Tree Packing.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Strategies for Spatial Joins
Spatial Indexing.
Area and Perimeter (P3) Definition:
Spatio-Temporal Databases
15-826: Multimedia Databases and Data Mining
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Spatial Indexing I R-trees
15-826: Multimedia Databases and Data Mining
R-trees: An Average Case Analysis
Donghui Zhang, Tian Xia Northeastern University
C. Faloutsos Spatial Access Methods - R-trees
Presentation transcript:

R-trees: An Average Case Analysis

R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?

R-trees - performance analysis A: because we can design insert,split,delete algorithms accordingly B: we can do query-optimization motivating question: on, e.g., split, should we try to minimize the area (volume)? the perimeter? the overlap? or a weighted combination? why?

R-trees - performance analysis How many disk accesses (expected value) for range queries? query distribution wrt location? “ “ wrt size?

R-trees - performance analysis How many disk accesses for range queries? query distribution wrt location? uniform; (biased) “ “ wrt size? uniform

R-trees - performance analysis easier case: we know the positions of data nodes and their MBRs, eg:

R-trees - performance analysis How many times will P1 be retrieved (unif. queries)? P1 x1 x2

R-trees - performance analysis How many times will P1 be retrieved (unif. POINT queries)? P1 x1 x

R-trees - performance analysis How many times will P1 be retrieved (unif. POINT queries)? A: x1*x2 P1 x1 x

R-trees - performance analysis How many times will P1 be retrieved (unif. queries of size q1xq2)? P1 x1 x q1 q2

R-trees - performance analysis Minkowski sum q1 q2 q1/2 q2/2

R-trees - performance analysis How many times will P1 be retrieved (unif. queries of size q1xq2)? A: (x1+q1)*(x2+q2) P1 x1 x q1 q2

R-trees - performance analysis Thus, given a tree with n nodes (i=1,... n) we expect

R-trees - performance analysis Thus, given a tree with n nodes (i=1,... n) we expect ‘ volume ’ ‘ surface area ’ count

R-trees - performance analysis Observations: for point queries: only volume matters for horizontal-line queries: (q2=0): vertical length matters for large queries (q1, q2 >> 0): the count N matters overlap: does not seem to matter (but it is related to area) formula: easily extendible to n dimensions

R-trees - performance analysis Conclusions: splits should try to minimize area and perimeter ie., we want few, small, square-like parent MBRs --- similar to R*-tree optimizations

More general Model What if we have only the dataset D and the set of query distribution S ? We should “ predict ” the structures of a “ good ” R-tree for this dataset. Then use the previous model to estimate the average query performance for S For point datasets, we can use the Fractal Dimension to find the “ average ” structure of the tree

Unifrom dataset Assume that the dataset (that contains only rectangles) is uniformly distributed in space. Density of a set of N MBRs is the average number of MBRs that contain a given point in space. OR the total area covered by the MBRs over the area of the work space. N boxes with average size s= (s 1,s 2 ), D(N,s) = N s 1 s 2 If s 1 =s 2 =s, then: (one more assumption here: squares)

Density of Leaf nodes Assume a dataset of N rectangles. If the average page capacity is f, then we have N ln = N/f leaf nodes. If D 1 is the density of the leaf MBRs, and the average area of each leaf MBR is s 1 2, then: So, we can estimate s 1, from N, f, D 1 We need to estimate D 1 from the dataset ’ s density…

Estimating D 1 Consider a leaf node that contains f MBRs. (if MBR square) Then for each side of the leaf node MBR we have: MBRs Also, N ln leaf nodes contain N MBRs, uniformly distributed. The average distance between the centers of two consecutive MBRs is t= (assuming [0,1] 2 space) t

Estimating D 1 Combining the previous observations we can estimate the density at the leaf level, from the density of the dataset: We can apply the same ideas recursively to the other levels of the tree.

R-trees–performance analysis Assuming Uniform distribution: where And D is the density of the dataset, f the fanout [TS96], N the number of objects

References Christos Faloutsos and Ibrahim Kamel. “ Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension ”. Proc. ACM PODS, Yannis Theodoridis and Timos Sellis. “ A Model for the Prediction of R- tree Performance ”. Proc. ACM PODS, 1996.