Search Trees.

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

Multimedia Database Systems
 Definition of B+ tree  How to create B+ tree  How to search for record  How to delete and insert a data.
Chapter 4: Trees Part II - AVL Tree
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Data Structures and Algorithms1 B-Trees with Minimum=1 2-3 Trees.
Spatial Indexing for NN retrieval
Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University.
Spatial Indexing SAMs.
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees (Part 2) COMP171. Slide 2 Review: B+ Tree of order M and of leaf size L n The root is either a leaf or 2 to M children n Each (internal) node.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Introduction to Database Systems1 B+-Trees Storage Technology: Topic 5.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
INDEXING SPATIAL DATABASES Atinder Singh Department of Computer Science University of California Riverside, CA
CSC 213 – Large Scale Programming. Today’s Goals  Review a new search tree algorithm is needed  What real-world problems occur with old tree?  Why.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
Generalized Search Trees J.M Hellerstein, J.F. Naughton and A. Pfeffer, “Generalized Search Trees for Database Systems,” Proc. 21 st Int’l Conf. On VLDB,
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Starting at Binary Trees
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Optimizing Multidimensional Index Trees for Main Memory Access Author: Kihong Kim, Sang K. Cha, Keunjoo Kwon Members: Iris Zhang, Grace Yung, Kara Kwon,
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Generalized Search Trees
Multiway Search Trees Data may not fit into main memory
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
(edited by Nadia Al-Ghreimil)
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
B- Trees D. Frey with apologies to Tom Anastasio
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Spatial Indexing I R-trees
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Presentation transcript:

Search Trees

R-Trees Introduction In order to handle spatial data efficiently, as required in CAD and Geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations

R-Trees R-Tree Index Structure An R-tree is a height-balanced tree similar to a B-tree with index records in its leaf nodes containing pointers to data objects Leaf nodes in an R-tree contain index record entries of the form (I, tuple-identifier) where tuple-identifier refers to a tuple in the database and I is an n-dimensional rectangle Non-leaf nodes contain entries of the form (I, child-pointer) where child-pointer is the address of a lower node in the R-tree and I covers all rectangles in the lower node’s entries

R-Trees Properties of R-Tree Every leaf node contains between m(<=M/2) and M index records unless it is the root For each index record (I, tuple-identifier) in a leaf node, I is the smallest rectangle that spatially contains the n-dimensional data object represented by the indicated tuple Every non-leaf node has between m and M children unless it is the root For each entry (I, child-pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node The root node has at least two children unless it is a leaf and all leaves appear on the same level

R-Tree structure

R-Tree structure

R-Trees …. contd Algorithm Search Given an R-tree whose root node is T, find all index records whose rectangles overlap a search rectangle S S1 [Search subtrees] If T is not a leaf, check each entry E to determine whether EI overlaps S. For all overlapping entries, invoke Search on the tree whose root node is pointed to by Ep S2 [Search leaf node] If T is a leaf, check all entries E to determine whether EI overlaps S. If so, E is a qualifying record

R-Trees …. contd Algorithm Insert L1 [Find position for new record] Invoke ChooseLeaf to select a leaf node L in which to place E L2 [Add record to leaf node] If L has room for another entry, install E. Otherwise invoke SplitNode to obtain L and LL containing E and all the old entries of L L3 [Propagate changes upward] Invoke AdjustTree on L, also passing LL if a split was performed L4 [Grow tree taller] If node split propagation caused the root to split, create a new root whose children are the two resulting nodes

R-Trees …. contd Algorithm ChooseLeaf CL1 [Initialize] Set N to be the root node CL2 [Leaf check] If N is a leaf, return N CL3 [Choose subtree] If N is not a leaf, let F be the entry in N whose rectangle FI needs least enlargement to include EI. Resolve ties by choosing the entry with the rectangle of smallest area. CL4 [Descend until a leaf is reached] Set N to be the child node pointed to by Fp and repeat from CL2

R-Trees …. contd Algorithm AdjustTree AT1 [Initialize] Set N=L. If L was spilt previously, set NN to be the resulting second node AT2 [Check if done] If N is the root, stop AT3 [Adjust covering rectangle in parent entry] Let P be the parent node of N, and let En be N’s entry in P. Adjust EnI so that it tightly encloses all entry rectangles in N AT4 [Propagate node split upward] If N has a partner NN resulting from an earlier split, create a new entry Enn with EnnP pointing to NN and EnnI enclosing all rectangles in NN. Add Enn to P if there is room. Otherwise, invoke SpiltNode to produce P and PP containing Enn and all P’s old entries … contd

R-Trees …. contd Algorithm AdjustTree … contd AT5 [Move up to next level] Set N=P and set NN=PP if a split occurred. Repeat from AT2

R-Trees …. contd Algorithm Deletion D1 [Find node containing record] Invoke FindLeaf to locate the leaf node L containing E. Stop if the record was not found. D2 [Delete record] Remove E from L D3 [Propagate changes] Invoke CondenseTree, passing L D4 [Shorten tree] If the root node has only one child after the tree has been adjusted, make the child the new root

R-Trees …. contd Algorithm FindLeaf FL1 [Search subtrees] If T is not a leaf, check each entry F in T to determine if FI overlaps EI. For each such entry invoke FindLeaf on the tree whose root is pointed to by Fp until E is found or all entries have been checked D2 [Search leaf node for record] If T is a leaf, check each entry to see if it matches E. If E is found return T

R-Trees …. contd Algorithm CondenseTree CT1 [Initialize] Set N=L. Set Q, the set of eliminated nodes, to be empty CT2 [Find parent entry] If N is the root, go to CT6. Otherwise let P be the parent of N, and let En be N’s entry in P CT3 [Eliminate under-full node] If N has fewer than m entries, delete En from P and add N to set Q CT4 [Adjust covering rectangle] If N has not been eliminated, adjust EnI to tightly contain all entries in N … contd

R-Trees …. contd Algorithm CondenseTree … contd CT5 [Move up one level in the tree] Set N=P and repeat from CT2 CT6 [Re-insert orphaned entries] Re-insert all entries of nodes in set Q. Entries from eliminated leaf nodes are re-inserted in tree leaves as described in Algorithm Insert, but entries from higher-level nodes must be placed higher in the tree, so that leaves of their dependent subtrees will be on the same level as leaves of the main tree

R-Trees …. contd Algorithm Quadratic Split QS1 [Pick first entry for each group] Apply PickSeeds to choose two entries to be the first elements of the groups. Assign each to a group. QS2 [Check if done] If all entries have been assigned, stop. If one group has so few entries that all the rest must be assigned to it in order for it to have the minimum number m, assign them and stop. QS3 [Select entry to assign] Invoke PickNext to choose the next entry to assign. Add it to the group whose covering rectangle will have to be enlarged least to accommodate it. Resolve ties by adding the entry to the group with smaller area, then to the one with fewer entries, then to either. Repeat from QS2

R-Trees …. contd Algorithm PickSeeds PS1 [Calculate inefficiency of grouping entries together] For each pair of entries E1 and E2, compose a rectangle J including E1I and E2I. Calculate d=area(J)-area(E1I)-area(E2I). PS2 [Choose the most wasteful pair] Choose the pair with the largest d

R-Trees …. contd Algorithm PickNext PN1 [Determine cost of putting each entry in each group] For each entry E not yet in a group, calculate d1=the area increase required in the covering rectangle of Group 1 to include E1. Calculate d2 similarly for Group 2. PN2 [Find entry with greatest preference for one group] Choose any entry with the maximum difference between d1 and d2

Generalized Search Tree (GiST) Why GiST Extensible both in data types supported and in the queries applied on this data Allows new data types to be indexed in a manner that supports the queries natural to the data type Unifies previously disparate structures for currently common data types Example B+ and R trees can be implemented as extensions to GiST. Single code base for indexing multiple dissimilar applications

GiST …. contd Definition A GiST is a balanced multi-way tree of variable fan-out between kM and M Where k is the fill factor With the exception of the root node that can have fan-out from 2 to M Leaf nodes: (p,ptr) ptr: Identifier of some tuple of the DB Non-leaf nodes: (p,ptr) ptr: Pointer to another tree node and p: Predicate used as a search key

GiST …. contd Properties Every node contains between kM and M index entries unless it is the root. For each index entry (p,ptr) in a leaf node, p holds for the tuple For each index entry (p,ptr) in a non-leaf node, p is true when instantiated with the values of any tuple reachable from ptr The root has at least two children unless it is a leaf All leaves appear on the same level

GiST …. contd GiST Methods Key Methods The methods the user can specify to configure the GiST. The methods encapsulate the structure and behavior of the object class used for keys in the tree Tree Methods Provided by the GiST, and may invoke the required key methods

GiST …. contd GiST Key Methods … contd E is an entry of the form (p,ptr) , q is a query, P a set of entries Consistent(E,q) returns false if p^q guaranteed unsatisfiable, true otherwise. Union(P) returns predicate r that holds for all predicates in P Compress(E) returns (p’,ptr). Decompress(E) returns (r,ptr) where pr. This is a lossy compression as we do not require pr

GiST …. contd GiST Key Methods … contd Penalty(E1,E2): returns domain specific penalty for inserting E2 into the subtree rooted at E1. Typically the penalty metric is representation of the increase of size from E1.p to Union(E1,E2). PickSplit(P): M+1 entries, splits P into two sets of entries P1,P2, each of the size kM. The choice of the minimum fill factor is controlled here

GiST …. contd GiST Tree Methods Search Controlled by the Consistent Method. Insert Controlled by the Penalty and PickSplit. Delete Controlled by the Consistent

Full.. Then split according to PickSplit Example (p,ptr) R New (q,ptr) Penalty = m Penalty = n m < n New (q,ptr) Penalty =i Penalty = j j < i (q,ptr) (p,ptr) Full.. Then split according to PickSplit

Applications GiST Over Z (B+ Trees) GiST Over Polygons in R2 (R Trees)

B+ Trees Using GiST p here is on the form Contains([xp,yp),v) Consistent(E,q) returns true if If q= Contains([xq,yq),v): (xp<yq)^(yp>xq) If q= Equal (xq,v): xp xq <yp Union(P) returns [Min(x1,x2,…,xn),MAX(y1,y2,….,yn)).

B+ Trees Using GiST … contd Penalty(E,F) If E is the leftmost pointer on its node, returns MAX(y2-y1,0) If E is the rightmost pointer on its node, returns MAX(x1-x2,0) Otherwise, returns MAX(y2-y1,0)+MAX(x1-x2,0) PickSplit(P) let the first entries in order to go to the left node and the remaining in the right node.

B+ Trees Using GiST … contd Compress(E) if E is the leftmost key on a non-leaf node return 0 bytes otherwise, returns E.p.x Decompress(E) If E is the leftmost key on a non-leaf node let x= - otherwise let x=E.p.x If E is the rightmost key on a non-leaf node let y= . If E is other entry in a non-leaf node, let y = the value stored in the next key. Otherwise, let y = x+1

R-Trees Using GiST The key here is in the form (xul,yul,xlr,ylr) Query predicates are: Contains ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2)) Returns true if (xul1 xul2) ^( yul1 yul2) ^ ( xlr1 xlr2) ^ ( ylr1 ylr2) Overlaps ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2)) Returns true if (xul1 xlr2) ^( yul1 ylr2) ^ ( xul2 xlr1) ^ ( ylr1 yul2) Equal ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2)) Returns true if (xul1= xul2) ^( yul1= yul2) ^ ( xlr1= xlr2) ^ ( ylr1= ylr2)

R-Trees Using GiST … contd Consistent(E,q) p contains (xul1,yul1,xlr1,ylr1), and q is either Contains, Overlap or Equal (xul2,yul2,xlr2,ylr2) Returns true if Overlaps ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2)) Union(P) returns coordinates of the maximum bounding rectangles of all rectangles in P.

R-Trees Using GiST … contd Penalty(E,F) Compute q= Union(E,F) and return area(q) – area(E.p) PickSplit(P) Variety of algorithms are provided to best split the entries in a over-full node

R-Trees Using GiST … contd Compress(E) Form the bounding rectangle of E.p Decompress(E) The identity function