C. Faloutsos Spatial Access Methods - R-trees

Slides:



Advertisements
Similar presentations
1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.
Advertisements

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications Faloutsos & Pavlo Lecture#11 (R&G ch. 11) Hashing.
Indexing and Range Queries in Spatio-Temporal Databases
Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Spatial Indexing for NN retrieval
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatial Indexing SAMs.
Spatial Queries Nearest Neighbor and Join Queries.
Multi-dimensional Indexes
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
Spatial Queries Nearest Neighbor Queries.
Spatial Queries. R-tree: variations What about static datasets? (no ins/del) Hilbert What about other bounding shapes?
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Spatial Data Management Chapter 28.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Spatial Access Methods (SAMs) ( based on notes by Silberchatz,Korth,
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Lecture 3: External Memory Indexing Structures (Contd) CS6931 Database Seminar.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
15-826: Multimedia Databases and Data Mining
Data Organization - B-trees
Many slides taken from George Kollios, Boston University
Spatial Queries Nearest Neighbor and Join Queries.
Spatio-Temporal Databases
Tree-Structured Indexes: Introduction
Tree-Structured Indexes
Spatial Indexing.
C. Faloutsos Indexing and Hashing – part I
Nearest Neighbor Queries using R-trees
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
R-tree: Indexing Structure for Data in Multi-dimensional Space
15-826: Multimedia Databases and Data Mining
Spatio-Temporal Databases
B+-Trees and Static Hashing
Faloutsos & Pavlo Lecture#11 (R&G ch. 11) Hashing
Tree-Structured Indexes
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
B-Tree.
Faloutsos - Pavlo CMU SCS /615
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Spatial Indexing I R-trees
File Processing : Multi-dimensional Index
15-826: Multimedia Databases and Data Mining
R-trees: An Average Case Analysis
Tree-Structured Indexes
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

C. Faloutsos Spatial Access Methods - R-trees Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications C. Faloutsos Spatial Access Methods - R-trees

General Overview Relational model – SQL; db design Indexing; Q-opt;Transaction processing Advanced topics Distributed Databases RAID Authorization / Stat. DB Spatial Access Methods (SAMs) Multimedia Indexing 15-415 - C. Faloutsos

SAMs - Detailed outline spatial access methods problem dfn z-ordering R-trees 15-415 - C. Faloutsos

SAMs - more detailed outline R-trees main idea; file structure (algorithms: insertion/split) (deletion) (search: range, nn, spatial joins) variations (packed; hilbert;...) 15-415 - C. Faloutsos

Reminder: problem Given a collection of geometric objects (points, lines, polygons, ...) organize them on disk, to answer spatial queries (range, nn, etc) 15-415 - C. Faloutsos

R-trees z-ordering: cuts regions to pieces -> dup. elim. how could we avoid that? Idea: Minimum Bounding Rectangles 15-415 - C. Faloutsos

R-trees [Guttman 84] Main idea: allow parents to overlap! => guaranteed 50% utilization => easier insertion/split algorithms. (only deal with Minimum Bounding Rectangles - MBRs) 15-415 - C. Faloutsos

R-trees eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> disk page I C A G F H B J E D 15-415 - C. Faloutsos

R-trees eg., w/ fanout 4: P1 P3 I C A G F H B J E P4 P2 D A B C H I J 15-415 - C. Faloutsos

R-trees eg., w/ fanout 4: P1 P3 I C A G F H B J E P4 P2 D P1 P2 P3 P4 15-415 - C. Faloutsos

R-trees - format of nodes {(MBR; obj-ptr)} for leaf nodes P1 P2 P3 P4 x-low; x-high y-low; y-high ... obj ptr A B C ... 15-415 - C. Faloutsos

R-trees - format of nodes {(MBR; node-ptr)} for non-leaf nodes x-low; x-high y-low; y-high ... node ptr P1 P2 P3 P4 A B C 15-415 - C. Faloutsos

R-trees - range search? P1 P3 I C A G F H B J E P4 P2 D P1 P2 P3 P4 A 15-415 - C. Faloutsos

R-trees - range search? P1 P3 I C A G F H B J E P4 P2 D P1 P2 P3 P4 A 15-415 - C. Faloutsos

R-trees - range search Observations: every parent node completely covers its ‘children’ a child MBR may be covered by more than one parent - it is stored under ONLY ONE of them. (ie., no need for dup. elim.) 15-415 - C. Faloutsos

R-trees - range search Observations - cont’d a point query may follow multiple branches. everything works for any dimensionality 15-415 - C. Faloutsos

SAMs - more detailed outline R-trees main idea; file structure algorithms: insertion/split deletion search: range, nn, spatial joins performance analysis variations (packed; hilbert;...) 15-415 - C. Faloutsos

R-trees - insertion eg., rectangle ‘X’ P1 P3 I C A G F H B X J E P4 P2 D D E F G 15-415 - C. Faloutsos

R-trees - insertion eg., rectangle ‘X’ P1 P3 I C A G F H B X J E P4 P2 D D E X F G 15-415 - C. Faloutsos

R-trees - insertion skip eg., rectangle ‘Y’ P1 P3 I C A G F H B J Y E D D E F G 15-415 - C. Faloutsos

R-trees - insertion skip eg., rectangle ‘Y’: extend suitable parent. F H B J A B C H I J Y E P4 P2 D D E F G Y 15-415 - C. Faloutsos

R-trees - insertion skip eg., rectangle ‘Y’: extend suitable parent. Q: how to measure ‘suitability’? 15-415 - C. Faloutsos

R-trees - insertion skip eg., rectangle ‘Y’: extend suitable parent. Q: how to measure ‘suitability’? A: by increase in area (volume) (more details: later, under ‘performance analysis’) Q: what if there is no room? how to split? 15-415 - C. Faloutsos

R-trees - insertion skip eg., rectangle ‘W’ P1 K P3 I C A G W F H B J D D E F G 15-415 - C. Faloutsos

R-trees - insertion skip eg., rectangle ‘W’ - focus on ‘P1’ - how to split? P1 K C A W B 15-415 - C. Faloutsos

R-trees - insertion skip eg., rectangle ‘W’ - focus on ‘P1’ - how to split? P1 K (A1: plane sweep, until 50% of rectangles) A2: ‘linear’ split A3: quadratic split A4: exponential split C A W B 15-415 - C. Faloutsos

R-trees - insertion & split skip R-trees - insertion & split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ seed2 R seed1 15-415 - C. Faloutsos

R-trees - insertion & split skip R-trees - insertion & split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ Q: how to measure ‘closeness’? 15-415 - C. Faloutsos

R-trees - insertion & split skip R-trees - insertion & split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ Q: how to measure ‘closeness’? A: by increase of area (volume) 15-415 - C. Faloutsos

R-trees - insertion & split skip R-trees - insertion & split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ seed2 R seed1 15-415 - C. Faloutsos

R-trees - insertion & split skip R-trees - insertion & split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ seed2 R seed1 15-415 - C. Faloutsos

R-trees - insertion & split skip R-trees - insertion & split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’ smart idea: pre-sort rectangles according to delta of closeness (ie., schedule easiest choices first!) 15-415 - C. Faloutsos

R-trees - insertion - pseudocode skip R-trees - insertion - pseudocode decide which parent to put new rectangle into (‘closest’ parent) if overflow, split to two, using (say,) the quadratic split algorithm propagate the split upwards, if necessary update the MBRs of the affected parents. 15-415 - C. Faloutsos

R-trees - insertion - observations skip R-trees - insertion - observations many more split algorithms exist (next!) 15-415 - C. Faloutsos

SAMs - more detailed outline skip SAMs - more detailed outline R-trees main idea; file structure algorithms: insertion/split deletion search: range, nn, spatial joins performance analysis variations (packed; hilbert;...) 15-415 - C. Faloutsos

R-trees - deletion skip delete rectangle if underflow ?? 15-415 - C. Faloutsos

R-trees - deletion skip delete rectangle if underflow temporarily delete all siblings (!); delete the parent node and re-insert them 15-415 - C. Faloutsos

SAMs - more detailed outline R-trees main idea; file structure algorithms: insertion/split deletion search: range, nn, spatial joins performance analysis variations (packed; hilbert;...) 15-415 - C. Faloutsos

R-trees - range search pseudocode: check the root for each branch, if its MBR intersects the query rectangle apply range-search (or print out, if this is a leaf) 15-415 - C. Faloutsos

R-trees - nn search skip A B C D E F G H I J P1 P2 P3 P4 q 15-415 - C. Faloutsos

R-trees - nn search skip Q: How? (find near neighbor; refine...) A B C J P1 P2 P3 P4 q 15-415 - C. Faloutsos

R-trees - nn search skip A1: depth-first search; then, range query P1 B J E P4 q P2 D 15-415 - C. Faloutsos

R-trees - nn search skip A1: depth-first search; then, range query P1 B J E P4 q P2 D 15-415 - C. Faloutsos

R-trees - nn search skip A1: depth-first search; then, range query P1 B J E P4 q P2 D 15-415 - C. Faloutsos

R-trees - nn search skip A2: [Roussopoulos+, sigmod95]: main idea: priority queue, with promising MBRs, and their best and worst-case distance main idea: 15-415 - C. Faloutsos

R-trees - nn search skip consider only P2 and P4, for illustration A B G H I J P1 P2 P3 P4 q 15-415 - C. Faloutsos

R-trees - nn search skip best of P4 => P4 is useless for 1-nn worst of P2 H J E P4 q P2 D 15-415 - C. Faloutsos

R-trees - nn search skip what is really the worst of, say, P2? worst of P2 E q P2 D 15-415 - C. Faloutsos

R-trees - nn search skip what is really the worst of, say, P2? A: the smallest of the two red segments! q P2 15-415 - C. Faloutsos

R-trees - nn search skip variations: [Hjaltason & Samet] incremental nn: build a priority queue scan enough of the tree, to make sure you have the k nn to find the (k+1)-th, check the queue, and scan some more of the tree ‘optimal’ (but, may need too much memory) 15-415 - C. Faloutsos

SAMs - more detailed outline skip SAMs - more detailed outline R-trees main idea; file structure algorithms: insertion/split deletion search: range, nn, spatial joins performance analysis variations (packed; hilbert;...) 15-415 - C. Faloutsos

R-trees - spatial joins skip R-trees - spatial joins Spatial joins: find (quickly) all counties intersecting lakes 15-415 - C. Faloutsos

R-trees - spatial joins skip R-trees - spatial joins Assume that they are both organized in R-trees: 15-415 - C. Faloutsos

R-trees - spatial joins skip R-trees - spatial joins for each parent P1 of tree T1 for each parent P2 of tree T2 if their MBRs intersect, process them recursively (ie., check their children) 15-415 - C. Faloutsos

R-trees - spatial joins skip R-trees - spatial joins Improvements - variations: - [Seeger+, sigmod 92]: do some pre-filtering; do plane-sweeping to avoid N1 * N2 tests for intersection - [Lo & Ravishankar, sigmod 94]: ‘seeded’ R-trees (FYI, many more papers on spatial joins, without R-trees: [Koudas+ Sevcik], e.t.c.) 15-415 - C. Faloutsos

SAMs - more detailed outline R-trees main idea; file structure algorithms: insertion/split deletion search: range, nn, spatial joins variations (packed; hilbert;...) 15-415 - C. Faloutsos

R-trees - variations Guttman’s R-trees sparked much follow-up work can we do better splits? what about static datasets (no ins/del/upd)? what about other bounding shapes? 15-415 - C. Faloutsos

R-trees - variations Guttman’s R-trees sparked much follow-up work can we do better splits? i.e, defer splits? 15-415 - C. Faloutsos

R-trees - variations A: R*-trees [Kriegel+, SIGMOD90] defer splits, by forced-reinsert, i.e.: instead of splitting, temporarily delete some entries, shrink overflowing MBR, and re-insert those entries Which ones to re-insert? How many? 15-415 - C. Faloutsos

R-trees - variations A: R*-trees [Kriegel+, SIGMOD90] defer splits, by forced-reinsert, i.e.: instead of splitting, temporarily delete some entries, shrink overflowing MBR, and re-insert those entries Which ones to re-insert? How many? A: 30% 15-415 - C. Faloutsos

R-trees - variations Q: Other ways to defer splits? 15-415 - C. Faloutsos

R-trees - variations Q: Other ways to defer splits? A: Push a few keys to the closest sibling node (closest = ??) 15-415 - C. Faloutsos

R-trees - variations R*-trees: Also try to minimize area AND perimeter, in their split. Performance: higher space utilization; faster than plain R-trees. One of the most successful R-tree variants. 15-415 - C. Faloutsos

R-trees - variations Guttman’s R-trees sparked much follow-up work can we do better splits? what about static datasets (no ins/del/upd)? Hilbert R-trees what about other bounding shapes? 15-415 - C. Faloutsos

R-trees - variations what about static datasets (no ins/del/upd)? Q: Best way to pack points? 15-415 - C. Faloutsos

R-trees - variations what about static datasets (no ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; terrible for ‘y’ 15-415 - C. Faloutsos

R-trees - variations what about static datasets (no ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; bad for ‘y’ 15-415 - C. Faloutsos

R-trees - variations what about static datasets (no ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; terrible for ‘y’ Q: how to improve? 15-415 - C. Faloutsos

R-trees - variations A: plane-sweep on HILBERT curve! 15-415 - C. Faloutsos

R-trees - variations A: plane-sweep on HILBERT curve! In fact, it can be made dynamic (how?), as well as to handle regions (how?) A: [Kamel+, VLDB94] 15-415 - C. Faloutsos

R-trees - variations Guttman’s R-trees sparked much follow-up work can we do better splits? what about static datasets (no ins/del/upd)? what about other bounding shapes? 15-415 - C. Faloutsos

R-trees - variations what about other bounding shapes? (and why?) A1: arbitrary-orientation lines (cell-tree, [Guenther] A2: P-trees (polygon trees) (MB polygon: 0, 90, 45, 135 degree lines) 15-415 - C. Faloutsos

R-trees - variations A3: L-shapes; holes (hB-tree) A4: TV-trees [Lin+, VLDB-Journal 1994] A5: SR-trees [Katayama+, SIGMOD97] (used in Informedia) 15-415 - C. Faloutsos

R-trees - conclusions Popular method; like multi-d B-trees guaranteed utilization good search times (for low-dim. at least) R*-, Hilbert- and SR-trees: still used Informix ships DataBlade with R-trees 15-415 - C. Faloutsos

References Guttman, A. (June 1984). R-Trees: A Dynamic Index Structure for Spatial Searching. Proc. ACM SIGMOD, Boston, Mass. Jagadish, H. V. (May 23-25, 1990). Linear Clustering of Objects with Multiple Attributes. ACM SIGMOD Conf., Atlantic City, NJ. Lin, K.-I., H. V. Jagadish, et al. (Oct. 1994). “The TV-tree - An Index Structure for High-dimensional Data.” VLDB Journal 3: 517-542. 15-415 - C. Faloutsos

References, cont’d Pagel, B., H. Six, et al. (May 1993). Towards an Analysis of Range Query Performance. Proc. of ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), Washington, D.C. Robinson, J. T. (1981). The k-D-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes. Proc. ACM SIGMOD. Roussopoulos, N., S. Kelley, et al. (May 1995). Nearest Neighbor Queries. Proc. of ACM-SIGMOD, San Jose, CA. 15-415 - C. Faloutsos