Accessing Spatial Data

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

Chapter 4: Trees Part II - AVL Tree
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
2-dimensional indexing structure
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 9: BTrees Tyler Robison Summer
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Spatial Indexing SAMs.
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
Birch: An efficient data clustering method for very large databases
CS4432: Database Systems II
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
R-T REES Accessing Spatial Data. I N THE BEGINNING … The B-Tree provided a foundation for R-Trees. But what’s a B-Tree? A data structure for storing sorted.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
B/B+ Trees 4.7.
Multiway Search Trees Data may not fit into main memory
Chapter 25: Advanced Data Types and New Applications
Spatial Indexing I Point Access Methods.
Spatial Indexing I R-trees
Database Design and Programming
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Accessing Spatial Data R-Trees Accessing Spatial Data

In the beginning… The B-Tree provided a foundation for R-Trees. But what’s a B-Tree? A data structure for storing sorted data with amortized run times for insertion and deletion Often used for data stored on long latency I/O (filesystems and DBs) because child nodes can be accessed together (since they are in order)

B-Tree From wikipedia

What’s wrong with B-Trees B-Trees cannot store new types of data Specifically people wanted to store geometrical data and multi-dimensional data The R-Tree provided a way to do that (thanx to Guttman ‘84)

R-Trees R-Trees can organize any-dimensional data by representing the data by a minimum bounding box. Each node bounds it’s children. A node can have many objects in it The leaves point to the actual objects (stored on disk probably) The height is always log n (it is height balanced)

R-Tree Example From http://lacot.org/public/enst/bda/img/schema1.gif

Operations Searching: look at all nodes that intersect, then recurse into those nodes. Many paths may lead nowhere Insertion: Locate place to insert node through searching and insert. If a node is full, then a split needs to be done Deletion: node becomes underfull. Reinsert other nodes to maintain balance.

Splitting Full Nodes Linear – choose far apart nodes as ends. Randomly choose nodes and assign them so that they require the smallest MBR enlargement Quadratic – choose two nodes so the dead space between them is maximized. Insert nodes so area enlargement is minimized Exponential – search all possible groupings Note: Only criteria is MBR area enlargement

Demo How can we visualize the R-Tree By clicking here

Variants - R+ Trees Avoids multiple paths during searching. Objects may be stored in multiple nodes MBRs of nodes at same tree level do not overlap On insertion/deletion the tree may change downward or upward in order to maintain the structure

R+ Tree http://perso.enst.fr/~saglio/bdas/EPFL0525/sld041.htm

Variants: Hilbert R-Tree Similar to other R-Trees except that the Hilbert value of its rectangle centroid is calculated. That key is used to guide the insertion On an overflow, evenly divide between two nodes Experiments has shown that this scheme significantly improves performance and decreases insertion complexity. Hilbert R-tree achieves up to 28% saving in the number of pages touched compared to R*-tree.

Hilbert Value?? The Hilbert value of an object is found by interleaving the bits of its x and y coordinates, and then chopping the binary string into 2-bit strings.  Then, for every 2-bit string, if the value is 0, we replace every 1 in the original string with a 3, and vice-versa.  If the value of the 2-bit string is 3, we replace all 2’s and 0’s in a similar fashion.  After this is done, you put all the 2-bit strings back together and compute the decimal value of the binary string; This is the Hilbert value of the object.  http://www-users.cs.umn.edu/research/shashi-group/CS8715/exercise_ans.doc

R*-Tree The original R-Tree only uses minimized MBR area to determine node splitting. There are other factors to consider as well that can have a great impact depending on the data By considering the other factors, R*-Trees become faster for spatial and point access queries.

Problems in original R-Tree Because the only criteria is to minimize area Certain types of data may create small areas but large distances which will initiate a bad split. If one group reaches a maximum number of entries, the rest of assigned without consideration of their geometry. Greene tried to solve, but he only used the “split axis” – more criteria needs to be used

Splitting overfilled nodes Why is this overfull?

R*-Tree Parameters Area covered by a rectangle should be minimized Overlap should be minimized The sum of the lengths of the edges (margins) should be minimized Storage utilization should be maximized (resulting in smaller tree height)

Splitting in R*-Trees Entries are sorted by their lower value, then their upper value of their rectangles. All possible distributions are determined Compute the sum of the margin values and choose the axis with the minimum as the split axis Along the split axis, choose the distribution with the minimum overlap Distribute entries into these two groups

Deleting and Forced Re-insertion Experimentally, it was shown that re-inserting data provided large (20-50%) improvement in performance. Thus, randomly deleting half the data and re-inserting is a good way to keep the structure balanced.

Results Lots of data sets and lots of query types. One example: Real Data: MBRs of elevation lines. 100K objects Disk accesses On insert Query Storage util. After build up

RC-Trees Changing motivations: Memory large enough to store objects It’s possible to store the object geometry and not just the MBR representation. Data is dynamic and transient Spatial objects naturally overlap (ie: stock market triggers)

RC-Trees Take advantage of dynamic segmentation If the original geometry is thrown away, then later on the MBR cannot be modified to represent new changes to the tree RC Tree does Clipping Domain Reduction Rebalancing

Discriminators A discriminator is used to decide (in binary) which direction a node should go in. (It means it’s a binary tree, unlike other R-Trees) It partitions the space If an object intersects a discriminator, the object can be clipped into two parts When an object is clipped, the space it takes up (in terms of its MBR) is reduced (aka domain reduction) This allows for removal of dead space and faster point query lookups

Domain Reduction and Clipping

Operations Insert, Delete and Search are straightforward What happens on an node that has been overflowed? Choose a discriminator to partition the object into balanced sets How is a discriminator chosen?

Partitioning Two methods for finding a discriminator for a partition RC-MID – faster, but ignores balancing and clipping. Uses pre-computed data to determine and average discriminator. Problems? Different distributions greatly affect partition Space requirements can be huge

Partitioning Take 2 RC-SWEEP Problems? sorts objects. Candidates for discriminators are the boundaries of the MBRs Assign a weight to each candidate using a formula not shown here Choose the minimum Problems? Slower, but space costs much better than RC-MID (which keeps info about nodes)

Rebuilding The tree can take a certain degree of flexibility in its structure before needing to be rebalanced On an insert, check if the height is too imbalanced If so, go to the imbalanced subtree and flush the items, sort and call split on them to get a better balancing

Experimentation CPU execution time not a good measure. (although they still calculate it) Instead use number of discriminators compared Lots of results Result summary: Insertion a little more expensive (because of possible rebalancing) Querying for point or spatial data faster (and fewer memory accesses) than all previous incarnations Storage requirements not that bad Dynamic segmentation (ie recalculating MBRs) can help a lot Controlling space with “γ” factor (by disallowing further splitting) controls space costs