CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.
Advertisements

1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
Indexing and Range Queries in Spatio-Temporal Databases
Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
2-dimensional indexing structure
Spatio-temporal Databases Time Parameterized Queries.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Spatial Indexing for NN retrieval
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Accessing Spatial Data
Spatial Indexing SAMs.
Spatial Queries Nearest Neighbor and Join Queries.
Multi-dimensional Indexes
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
Spatial Queries Nearest Neighbor Queries.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Spatial Queries. R-tree: variations What about static datasets? (no ins/del) Hilbert What about other bounding shapes?
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Advanced Data Structures NTUA 2007 R-trees and Grid File.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Spatial Data Management Chapter 28.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Spatial Access Methods (SAMs) ( based on notes by Silberchatz,Korth,
Introduction to The NSP-Tree: A Space-Partitioning Based Indexing Method Gang Qian University of Central Oklahoma November 2006.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
Advanced Data Structures NTUA 2007 R-trees and Grid File.
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Lecture 3: External Memory Indexing Structures (Contd) CS6931 Database Seminar.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Spatial Queries Nearest Neighbor and Join Queries Most slides are based on slides provided By Prof. Christos Faloutsos (CMU) and Prof. Dimitris Papadias.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Many slides taken from George Kollios, Boston University
Spatial Queries Nearest Neighbor and Join Queries.
Spatial Indexing.
C. Faloutsos Indexing and Hashing – part I
Nearest Neighbor Queries using R-trees
R-tree: Indexing Structure for Data in Multi-dimensional Space
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Spatial Indexing I R-trees
15-826: Multimedia Databases and Data Mining
R-trees: An Average Case Analysis
C. Faloutsos Spatial Access Methods - R-trees
Presentation transcript:

CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos

CMU SCS Copyright: C. Faloutsos (2012)#2 Must-read material Textbook, Chapter 5.2 Ramakrinshan+Gehrke, Chapter 28.6 Guttman, A. (June 1984). R-Trees: A Dynamic Index Structure for Spatial Searching. Proc. ACM SIGMOD, Boston, Mass.R-Trees: A Dynamic Index Structure for Spatial Searching Ibrahim Kamel and Christos Faloutsos, Hilbert R- tree: An improved R-tree using fractals Proc. of VLDB Conference, Santiago, Chile, Sept , 1994, pp Hilbert R- tree: An improved R-tree using fractals

CMU SCS Copyright: C. Faloutsos (2012)#3 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining

CMU SCS Copyright: C. Faloutsos (2012)#4 Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods –problem dfn –z-ordering –R-trees –... text...

CMU SCS Copyright: C. Faloutsos (2012)#5 Indexing - more detailed outline R-trees –main idea; file structure –algorithms: insertion/split –deletion –search: range, nn, spatial joins –performance analysis –variations (packed; hilbert;...)

CMU SCS Copyright: C. Faloutsos (2012)#6 Reminder: problem Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer spatial queries (range, nn, etc)

CMU SCS Copyright: C. Faloutsos (2012)#7 R-trees z-ordering: cuts regions to pieces -> dup. elim. how could we avoid that? Idea: try to extend/merge B-trees and k-d trees

CMU SCS Copyright: C. Faloutsos (2012)#8 (first attempt: k-d-B-trees) [Robinson, 81]: if f is the fanout, split point- set in f parts; and so on, recursively

CMU SCS Copyright: C. Faloutsos (2012)#9 (first attempt: k-d-B-trees) But: insertions/deletions are tricky (splits may propagate downwards and upwards) no guarantee on space utilization

CMU SCS Copyright: C. Faloutsos (2012)#10 R-trees [Guttman 84] Main idea: allow parents to overlap! Antonin Guttman [

CMU SCS Copyright: C. Faloutsos (2012)#11 R-trees [Guttman 84] Main idea: allow parents to overlap! –=> guaranteed 50% utilization –=> easier insertion/split algorithms. –(only deal with Minimum Bounding Rectangles - MBRs)

CMU SCS Copyright: C. Faloutsos (2012)#12 R-trees eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> disk page A B C D E F G H I J

CMU SCS Copyright: C. Faloutsos (2012)#13 R-trees eg., w/ fanout 4: A B C D E F G H I J P1 P2 P3 P4 FGDEHIJABC

CMU SCS Copyright: C. Faloutsos (2012)#14 R-trees eg., w/ fanout 4: A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC

CMU SCS Copyright: C. Faloutsos (2012)#15 R-trees - format of nodes {(MBR; obj-ptr)} for leaf nodes P1P2P3P4 ABC x-low; x-high y-low; y-high... obj ptr...

CMU SCS Copyright: C. Faloutsos (2012)#16 R-trees - format of nodes {(MBR; node-ptr)} for non-leaf nodes P1P2P3P4 ABC x-low; x-high y-low; y-high... node ptr...

CMU SCS Copyright: C. Faloutsos (2012)#17 R-trees - range search? A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC

CMU SCS Copyright: C. Faloutsos (2012)#18 R-trees - range search? A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC

CMU SCS Copyright: C. Faloutsos (2012)#19 R-trees - range search Observations: every parent node completely covers its ‘children’ a child MBR may be covered by more than one parent - it is stored under ONLY ONE of them. (ie., no need for dup. elim.)

CMU SCS Copyright: C. Faloutsos (2012)#20 R-trees - range search Observations - cont’d a point query may follow multiple branches. everything works for any dimensionality

CMU SCS Copyright: C. Faloutsos (2012)#21 Indexing - more detailed outline R-trees –main idea; file structure –algorithms: insertion/split –deletion –search: range, nn, spatial joins –performance analysis –variations (packed; hilbert;...)

CMU SCS Copyright: C. Faloutsos (2012)#22 R-trees - insertion eg., rectangle ‘X’ A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC X

CMU SCS Copyright: C. Faloutsos (2012)#23 R-trees - insertion eg., rectangle ‘X’ A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC X X

CMU SCS Copyright: C. Faloutsos (2012)#24 R-trees - insertion eg., rectangle ‘Y’ A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC Y

CMU SCS Copyright: C. Faloutsos (2012)#25 R-trees - insertion eg., rectangle ‘Y’: extend suitable parent. A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC Y Y

CMU SCS Copyright: C. Faloutsos (2012)#26 R-trees - insertion eg., rectangle ‘Y’: extend suitable parent. Q: how to measure ‘suitability’?

CMU SCS Copyright: C. Faloutsos (2012)#27 R-trees - insertion eg., rectangle ‘Y’: extend suitable parent. Q: how to measure ‘suitability’? A: by increase in area (volume) (more details: later, under ‘performance analysis’) Q: what if there is no room? how to split?

CMU SCS Copyright: C. Faloutsos (2012)#28 R-trees - insertion eg., rectangle ‘W’ A B C D E F G H I J P1 P2 P3 P4 P1P2P3P4 FGDEHIJABC W K K

CMU SCS Copyright: C. Faloutsos (2012)#29 R-trees - insertion eg., rectangle ‘W’ - focus on ‘P1’ - how to split? A B C P1 W K

CMU SCS Copyright: C. Faloutsos (2012)#30 R-trees - insertion eg., rectangle ‘W’ - focus on ‘P1’ - how to split? A B C P1 W K (A1: plane sweep, until 50% of rectangles) A2: ‘linear’ split A3: quadratic split A4: exponential split

CMU SCS Copyright: C. Faloutsos (2012)#31 R-trees - insertion & split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ seed1 seed2 R

CMU SCS Copyright: C. Faloutsos (2012)#32 R-trees - insertion & split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ Q: how to measure ‘closeness’?

CMU SCS Copyright: C. Faloutsos (2012)#33 R-trees - insertion & split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ Q: how to measure ‘closeness’? A: by increase of area (volume)

CMU SCS Copyright: C. Faloutsos (2012)#34 R-trees - insertion & split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ seed1 seed2 R

CMU SCS Copyright: C. Faloutsos (2012)#35 R-trees - insertion & split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ seed1 seed2 R

CMU SCS Copyright: C. Faloutsos (2012)#36 R-trees - insertion & split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ smart idea: pre-sort rectangles according to delta of closeness (ie., schedule easiest choices first!)

CMU SCS Copyright: C. Faloutsos (2012)#37 R-trees - insertion - pseudocode decide which parent to put new rectangle into (‘closest’ parent) if overflow, split to two, using (say,) the quadratic split algorithm –propagate the split upwards, if necessary update the MBRs of the affected parents.

CMU SCS Copyright: C. Faloutsos (2012)#38 R-trees - insertion - observations many more split algorithms exist (next!)

CMU SCS Copyright: C. Faloutsos (2012)#39 Indexing - more detailed outline R-trees –main idea; file structure –algorithms: insertion/split –deletion –search: range, nn, spatial joins –performance analysis –variations (packed; hilbert;...)

CMU SCS Copyright: C. Faloutsos (2012)#40 R-trees - deletion delete rectangle if underflow –??

CMU SCS Copyright: C. Faloutsos (2012)#41 R-trees - deletion delete rectangle if underflow –temporarily delete all siblings (!); –delete the parent node and –re-insert them

CMU SCS Copyright: C. Faloutsos (2012)#42 R-trees - deletion variations: later (eg. Hilbert R-trees w/ 2-to- 1 merge)

CMU SCS Copyright: C. Faloutsos (2012)#43 Indexing - more detailed outline R-trees –main idea; file structure –algorithms: insertion/split –deletion –search: range, nn, spatial joins –performance analysis –variations (packed; hilbert;...)

CMU SCS Copyright: C. Faloutsos (2012)#44 R-trees - range search pseudocode: check the root for each branch, if its MBR intersects the query rectangle apply range-search (or print out, if this is a leaf)

CMU SCS Copyright: C. Faloutsos (2012)#45 R-trees - nn search A B C D E F G H I J P1 P2 P3 P4 q

CMU SCS Copyright: C. Faloutsos (2012)#46 R-trees - nn search Q: How? (find near neighbor; refine...) A B C D E F G H I J P1 P2 P3 P4 q

CMU SCS Copyright: C. Faloutsos (2012)#47 R-trees - nn search A1: depth-first search; then, range query A B C D E F G H I J P1 P2 P3 P4 q

CMU SCS Copyright: C. Faloutsos (2012)#48 R-trees - nn search A1: depth-first search; then, range query A B C D E F G H I J P1 P2 P3 P4 q

CMU SCS Copyright: C. Faloutsos (2012)#49 R-trees - nn search A1: depth-first search; then, range query A B C D E F G H I J P1 P2 P3 P4 q

CMU SCS Copyright: C. Faloutsos (2012)#50 R-trees - nn search A2: [Roussopoulos+, sigmod95]: –priority queue, with promising MBRs, and their best and worst-case distance main idea:

CMU SCS Copyright: C. Faloutsos (2012)#51 R-trees - nn search A B C D E F G H I J P1 P2 P3 P4 q consider only P2 and P4, for illustration

CMU SCS Copyright: C. Faloutsos (2012)#52 R-trees - nn search D E H J P2 P4 q worst of P2 best of P4 => P4 is useless for 1-nn

CMU SCS Copyright: C. Faloutsos (2012)#53 R-trees - nn search D E H J P2 P4 q worst of P2 best of P4 => P4 is useless for 1-nn

CMU SCS Copyright: C. Faloutsos (2012)#54 R-trees - nn search D E P2 q worst of P2 what is really the worst of, say, P2?

CMU SCS Copyright: C. Faloutsos (2012)#55 R-trees - nn search P2 q what is really the worst of, say, P2? A: the smallest of the two red segments!

CMU SCS Copyright: C. Faloutsos (2012)#56 R-trees - nn search variations: [Hjaltason & Samet] incremental nn: –build a priority queue –scan enough of the tree, to make sure you have the k nn –to find the (k+1)-th, check the queue, and scan some more of the tree ‘optimal’ (but, may need too much memory)

CMU SCS Copyright: C. Faloutsos (2012)#57 Indexing - more detailed outline R-trees –main idea; file structure –algorithms: insertion/split –deletion –search: range, nn, spatial joins –performance analysis –variations (packed; hilbert;...)

CMU SCS Copyright: C. Faloutsos (2012)#58 R-trees - spatial joins Spatial joins: find (quickly) all counties intersecting lakes

CMU SCS Copyright: C. Faloutsos (2012)#59 R-trees - spatial joins Spatial joins: find (quickly) all counties intersecting lakes

CMU SCS Copyright: C. Faloutsos (2012)#60 R-trees - spatial joins Spatial joins: find (quickly) all counties intersecting lakes

CMU SCS Copyright: C. Faloutsos (2012)#61 R-trees - spatial joins Assume that they are both organized in R-trees:

CMU SCS Copyright: C. Faloutsos (2012)#62 R-trees - spatial joins Assume that they are both organized in R-trees:

CMU SCS Copyright: C. Faloutsos (2012)#63 R-trees - spatial joins for each parent P1 of tree T1 for each parent P2 of tree T2 if their MBRs intersect, process them recursively (ie., check their children)

CMU SCS Copyright: C. Faloutsos (2012)#64 R-trees - spatial joins Improvements - variations: - [Seeger+, sigmod 92]: do some pre-filtering; do plane-sweeping to avoid N1 * N2 tests for intersection - [Lo & Ravishankar, sigmod 94]: ‘seeded’ R-trees (FYI, many more papers on spatial joins, without R- trees: [Koudas+ Sevcik], e.t.c.)

CMU SCS Copyright: C. Faloutsos (2012)#65 Indexing - more detailed outline R-trees –main idea; file structure –algorithms: insertion/split –deletion –search: range, nn, spatial joins –performance analysis –variations (packed; hilbert;...)

CMU SCS Copyright: C. Faloutsos (2012)#66 R-trees - performance analysis How many disk (=node) accesses we’ll need for –range –nn –spatial joins why does it matter?

CMU SCS Copyright: C. Faloutsos (2012)#67 R-trees - performance analysis How many disk (=node) accesses we’ll need for –range –nn –spatial joins why does it matter? A: because we can design split etc algorithms accordingly; also, do query- optimization

CMU SCS Copyright: C. Faloutsos (2012)#68 R-trees - performance analysis How many disk (=node) accesses we’ll need for –range –nn –spatial joins why does it matter? A: because we can design split etc algorithms accordingly; also, do query- optimization

CMU SCS Copyright: C. Faloutsos (2012)#69 R-trees - performance analysis motivating question: on, e.g., split, should we try to minimize the area (volume)? the perimeter? the overlap? or a weighted combination? why?

CMU SCS Copyright: C. Faloutsos (2012)#70 R-trees - performance analysis How many disk accesses for range queries? –query distribution wrt location? – “ “ wrt size?

CMU SCS Copyright: C. Faloutsos (2012)#71 R-trees - performance analysis How many disk accesses for range queries? –query distribution wrt location? uniform; (biased) – “ “ wrt size? uniform

CMU SCS Copyright: C. Faloutsos (2012)#72 R-trees - performance analysis easier case: we know the positions of parent MBRs, eg:

CMU SCS Copyright: C. Faloutsos (2012)#73 R-trees - performance analysis How many times will P1 be retrieved (unif. queries)? P1 x1 x2

CMU SCS Copyright: C. Faloutsos (2012)#74 R-trees - performance analysis How many times will P1 be retrieved (unif. POINT queries)? P1 x1 x

CMU SCS Copyright: C. Faloutsos (2012)#75 R-trees - performance analysis How many times will P1 be retrieved (unif. POINT queries)? A: x1*x2 P1 x1 x

CMU SCS Copyright: C. Faloutsos (2012)#76 R-trees - performance analysis How many times will P1 be retrieved (unif. queries of size q1xq2)? P1 x1 x q1 q2

CMU SCS Copyright: C. Faloutsos (2012)#77 R-trees - performance analysis How many times will P1 be retrieved (unif. queries of size q1xq2)? P1 x1 x2 q1 q2

CMU SCS Copyright: C. Faloutsos (2012)#78 R-trees - performance analysis How many times will P1 be retrieved (unif. queries of size q1xq2)? P1

CMU SCS Copyright: C. Faloutsos (2012)#79 R-trees - performance analysis How many times will P1 be retrieved (unif. queries of size q1xq2)? P1 x2 q2 x1q1

CMU SCS Copyright: C. Faloutsos (2012)#80 R-trees - performance analysis How many times will P1 be retrieved (unif. queries of size q1xq2)? A: (x1+q1)*(x2+q2) P1 x2 q2 x1q1

CMU SCS Copyright: C. Faloutsos (2012)#81 R-trees - performance analysis Thus, given a tree with N nodes (i=1,... N) we expect #DiskAccesses(q1,q2) = sum ( x i,1 + q1) * (x i,2 + q2) = sum ( x i,1 * x i,2 ) + q2 * sum ( x i,1 ) + q1* sum ( x i,2 ) q1* q2 * N

CMU SCS Copyright: C. Faloutsos (2012)#82 R-trees - performance analysis Thus, given a tree with N nodes (i=1,... N) we expect #DiskAccesses(q1,q2) = sum ( x i,1 + q1) * (x i,2 + q2) = sum ( x i,1 * x i,2 ) + q2 * sum ( x i,1 ) + q1* sum ( x i,2 ) q1* q2 * N ‘volume’ surface area count

CMU SCS Copyright: C. Faloutsos (2012)#83 R-trees - performance analysis Observations: for point queries: only volume matters for horizontal-line queries: (q2=0): vertical length matters for large queries (q1, q2 >> 0): the count N matters

CMU SCS Copyright: C. Faloutsos (2012)#84 R-trees - performance analysis Observations (cont’ed) overlap: does not seem to matter formula: easily extendible to n dimensions (for even more details: [Pagel +, PODS93], [Kamel+, CIKM93]) Berndt-Uwe Pagel

CMU SCS Copyright: C. Faloutsos (2012)#85 R-trees - performance analysis Conclusions: splits should try to minimize area and perimeter ie., we want few, small, square-like parent MBRs rule of thumb: shoot for queries with q1=q2 = 0.1 (or =0.5 or so).

CMU SCS Copyright: C. Faloutsos (2012)#86 R-trees - performance analysis How many disk (=node) accesses we’ll need for –range –nn –spatial joins

CMU SCS Copyright: C. Faloutsos (2012)#87 R-trees - performance analysis Range queries - how many disk accesses, if we just now that we have - N points in n-d space? A: ?

CMU SCS Copyright: C. Faloutsos (2012)#88 R-trees - performance analysis Range queries - how many disk accesses, if we just now that we have - N points in n-d space? A: can not tell! need to know distribution

CMU SCS Copyright: C. Faloutsos (2012)#89 R-trees - performance analysis What are obvious and/or realistic distributions?

CMU SCS Copyright: C. Faloutsos (2012)#90 R-trees - performance analysis What are obvious and/or realistic distributions? A: uniform A: Gaussian / mixture of Gaussians A: self-similar / fractal. Fractal dimension ~ intrinsic dimension

CMU SCS Copyright: C. Faloutsos (2012)#91 R-trees - performance analysis Formulas for range queries and k-nn queries: use fractal dimension [Kamel+, PODS94], [Korn+ ICDE2000] [Kriegel+, PODS97]

CMU SCS Copyright: C. Faloutsos (2012)#92 Indexing - more detailed outline R-trees –main idea; file structure –algorithms: insertion/split –deletion –search: range, nn, spatial joins –performance analysis –variations (packed; hilbert;...)

CMU SCS Copyright: C. Faloutsos (2012)#93 R-trees - variations Guttman’s R-trees sparked much follow-up work can we do better splits? what about static datasets (no ins/del/upd)? what about other bounding shapes?

CMU SCS Copyright: C. Faloutsos (2012)#94 R-trees - variations Guttman’s R-trees sparked much follow-up work can we do better splits? –i.e, defer splits?

CMU SCS Copyright: C. Faloutsos (2012)#95 R-trees - variations A: R*-trees [Beckmann+, SIGMOD90] Norbert Beckmann Hans Peter Kriegel Ralf Schneider Bernhard Seeger Popular

CMU SCS Copyright: C. Faloutsos (2012)#96 R-trees - variations A: R*-trees [Beckmann+, SIGMOD90] defer splits, by forced-reinsert, i.e.: instead of splitting, temporarily delete some entries, shrink overflowing MBR, and re-insert those entries Which ones to re-insert? How many?

CMU SCS Copyright: C. Faloutsos (2012)#97 R-trees - variations A: R*-trees [Beckmann+, SIGMOD90] defer splits, by forced-reinsert, i.e.: instead of splitting, temporarily delete some entries, shrink overflowing MBR, and re-insert those entries Which ones to re-insert? How many? A: 30%

CMU SCS Copyright: C. Faloutsos (2012)#98 R-trees - variations Q: Other ways to defer splits?

CMU SCS Copyright: C. Faloutsos (2012)#99 R-trees - variations Q: Other ways to defer splits? A: Push a few keys to the closest sibling node (closest = ??)

CMU SCS Copyright: C. Faloutsos (2012)#100 R-trees - variations R*-trees: Also try to minimize area AND perimeter, in their split. Performance: higher space utilization; faster than plain R-trees. One of the most successful R-tree variants.

CMU SCS Copyright: C. Faloutsos (2012)#101 R-trees - variations Guttman’s R-trees sparked much follow-up work can we do better splits? what about static datasets (no ins/del/upd)? –Hilbert R-trees what about other bounding shapes?

CMU SCS Copyright: C. Faloutsos (2012)#102 R-trees - variations what about static datasets (no ins/del/upd)? Q: Best way to pack points?

CMU SCS Copyright: C. Faloutsos (2012)#103 R-trees - variations what about static datasets (no ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; terrible for ‘y’

CMU SCS Copyright: C. Faloutsos (2012)#104 R-trees - variations what about static datasets (no ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; bad for ‘y’

CMU SCS Copyright: C. Faloutsos (2012)#105 R-trees - variations what about static datasets (no ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; terrible for ‘y’ Q: how to improve?

CMU SCS Copyright: C. Faloutsos (2012)#106 R-trees - variations A: plane-sweep on HILBERT curve!

CMU SCS Copyright: C. Faloutsos (2012)#107 R-trees - variations A: plane-sweep on HILBERT curve! In fact, it can be made dynamic (how?), as well as to handle regions (how?)

CMU SCS Copyright: C. Faloutsos (2012)#108 R-trees - variations Dynamic (‘Hilbert R- tree): –each point has an ‘h’- value (hilbert value) –insertions: like a B-tree on the h-value –but also store MBR, for searches

CMU SCS Copyright: C. Faloutsos (2012)#109 R-trees - variations Data structure of a node? LHV x-low, ylow x-high, y-high ptr h-value >= LHV & MBRs: inside parent MBR Details

CMU SCS Copyright: C. Faloutsos (2012)#110 R-trees - variations Data structure of a node? LHV x-low, ylow x-high, y-high ptr h-value >= LHV & MBRs: inside parent MBR ~B-tree Details

CMU SCS Copyright: C. Faloutsos (2012)#111 R-trees - variations Data structure of a node? LHV x-low, ylow x-high, y-high ptr h-value >= LHV & MBRs: inside parent MBR ~ R-tree Details

CMU SCS Copyright: C. Faloutsos (2012)#112 R-trees - variations Guttman’s R-trees sparked much follow-up work can we do better splits? what about static datasets (no ins/del/upd)? –Hilbert R-trees - main idea –handling regions –performance/discusion what about other bounding shapes? Details

CMU SCS Copyright: C. Faloutsos (2012)#113 R-trees - variations What if we have regions, instead of points? I.e., how to impose a linear ordering (‘h- value’) on rectangles? Details

CMU SCS Copyright: C. Faloutsos (2012)#114 R-trees - variations What if we have regions, instead of points? I.e., how to impose a linear ordering (‘h- value’) on rectangles? A1: h-value of center A2: h-value of 4-d point (center, x-radius, y-radius) A3:... Details

CMU SCS Copyright: C. Faloutsos (2012)#115 R-trees - variations What if we have regions, instead of points? I.e., how to impose a linear ordering (‘h- value’) on rectangles? A1: h-value of center A2: h-value of 4-d point (center, x-radius, y-radius) A3:... Details

CMU SCS Copyright: C. Faloutsos (2012)#116 R-trees - variations with h-values, we can have deferred splits, 2- to-3 splits (3-to-4, etc) experimentally: faster than R*-trees (reference: [Kamel Faloutsos vldb 94]) Details

CMU SCS Copyright: C. Faloutsos (2012)#117 R-trees - variations Guttman’s R-trees sparked much follow-up work can we do better splits? what about static datasets (no ins/del/upd)? what about other bounding shapes?

CMU SCS Copyright: C. Faloutsos (2012)#118 R-trees - variations what about other bounding shapes? (and why?) A1: arbitrary-orientation lines (cell-tree, [Guenther] A2: P-trees (polygon trees) (MB polygon: 0, 90, 45, 135 degree lines)

CMU SCS Copyright: C. Faloutsos (2012)#119 R-trees - variations A3: L-shapes; holes (hB-tree) A4: TV-trees [Lin+, VLDB-Journal 1994] A5: SR-trees [Katayama+, SIGMOD97] (used in Informedia)

CMU SCS Copyright: C. Faloutsos (2012)#120 Indexing - Detailed outline spatial access methods –problem dfn –z-ordering –R-trees –misc topics grid files dimensionality curse metric trees other nn methods text,...

CMU SCS Copyright: C. Faloutsos (2012)#121 R-trees - conclusions Popular method; like multi-d B-trees guaranteed utilization good search times (for low-dim. at least) Informix (-> IBM-DB2) ships DataBlade with R-trees R* variation is popular

CMU SCS Copyright: C. Faloutsos (2012)#122 References Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. ACM SIGMOD 1990: Guttman, A. (June 1984). R-Trees: A Dynamic Index Structure for Spatial Searching. Proc. ACM SIGMOD, Boston, Mass.

CMU SCS Copyright: C. Faloutsos (2012)#123 References Jagadish, H. V. (May 23-25, 1990). Linear Clustering of Objects with Multiple Attributes. ACM SIGMOD Conf., Atlantic City, NJ. Ibrahim Kamel, Christos Faloutsos: On Packing R-trees, CIKM, 1993 Lin, K.-I., H. V. Jagadish, et al. (Oct. 1994). “The TV-tree - An Index Structure for High- dimensional Data.” VLDB Journal 3:

CMU SCS Copyright: C. Faloutsos (2012)#124 References, cont’d Pagel, B., H. Six, et al. (May 1993). Towards an Analysis of Range Query Performance. Proc. of ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), Washington, D.C. Robinson, J. T. (1981). The k-D-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes. Proc. ACM SIGMOD. Roussopoulos, N., S. Kelley, et al. (May 1995). Nearest Neighbor Queries. Proc. of ACM-SIGMOD, San Jose, CA.

CMU SCS Copyright: C. Faloutsos (2012)#125 Other resources Code, papers, datasets etc: Java applets and more info: donar.umiacs.umd.edu/quadtree/points/rtrees.html