Generalized Search Trees J.M Hellerstein, J.F. Naughton and A. Pfeffer, “Generalized Search Trees for Database Systems,” Proc. 21 st Int’l Conf. On VLDB,

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

 Definition of B+ tree  How to create B+ tree  How to search for record  How to delete and insert a data.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Algorithms and Data Structures Lecture 4. Agenda: Trees – fundamental notions, variations Binary search tree.
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Search Trees.
Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University.
1 Tree-Structured Indexes Yanlei Diao UMass Amherst Feb 20, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CS 206 Introduction to Computer Science II 02 / 11 / 2009 Instructor: Michael Eckmann.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Introduction to Database Systems1 B+-Trees Storage Technology: Topic 5.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B+ Tree What is a B+ Tree Searching Insertion Deletion.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
Tree.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Mehdi Kargar Department of Computer Science and Engineering 1.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
COSC 2007 Data Structures II Chapter 15 External Methods.
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Adapted from Mike Franklin
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
CS 206 Introduction to Computer Science II 10 / 05 / 2009 Instructor: Michael Eckmann.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
CS 206 Introduction to Computer Science II 02 / 13 / 2009 Instructor: Michael Eckmann.
2-3 Trees Extended tree.  Tree in which all empty subtrees are replaced by new nodes that are called external nodes.  Original nodes are called internal.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
CompSci Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
B+ tree & B tree Extracted from Garcia Molina
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
CS4432: Database Systems II More on Index Structures 1.
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
UNC Chapel Hill M. C. Lin Geometric Data Structures Reading: Chapter 10 of the Textbook Driving Applications –Windowing Queries Related Application –Query.
Geometric Data Structures
Generalized Search Trees
Multiway Search Trees Data may not fit into main memory
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Lecture 22 Binary Search Trees Chapter 10 of textbook
B+ Tree.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Binary Tree and General Tree
COP3530- Data Structures B Trees
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Indexing and Hashing Basic Concepts Ordered Indices
Advance Database System
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
2-3 Trees Extended tree. Tree in which all empty subtrees are replaced by new nodes that are called external nodes. Original nodes are called internal.
Data Structures Using C++ 2E
Presentation transcript:

Generalized Search Trees J.M Hellerstein, J.F. Naughton and A. Pfeffer, “Generalized Search Trees for Database Systems,” Proc. 21 st Int’l Conf. On VLDB, Sep Presented By Ihab Ilyas

Topics Motivation. Database Search Trees. Generalized Search Tree. Properties. Methods. Applications.

Motivation New applications (Multimedia, CAD tools, document libraries…etc.) New Data types Extending search trees to maximum flexibility

Specialized Search Trees Example: Spatial Search Trees ( R-Trees) Problem: New Applications implies new tree structure from scratch Search Trees For Extensible Data Types Example: Extending B+ to index any ordinal data Problem: Extending data but not the set of queries supported. Before GiST

GiST A third direction for extending search trees Extensible both in data types supported and in the queries applied on this data. Allows new data types to be indexed in a manner that supports the queries natural to the data type.

GiST (Cont.) Unifies previously disparate structures for currently common data types. Examples: B+ and R trees can be implemented as extensions to GiST. Single code base for indexing multiple dissimilar applications.

Database Search Trees Canonical rough picture of database search tree Leaf nodes (Linked List) Internal Nodes Key1 Key2 ….

Search Trees (cont.) Search Key: A search key may be arbitrary predicate that holds for each datum below the key. Search Tree: A hierarchy of categorizations, in which each categorization holds for all data stored under it in the hierarchy.

Generalized Search Tree Definition: A GiST is a balanced multi-way tree of variable fan-out between kM and M Where k is the fill factor. With the exception of the root node that can have fan-out from 2 to M.

GiST (Cont.) Leaf nodes: (p,ptr) p: Predicate used as a search key. ptr: the identifier of some tuple of the database. Non-leaf nodes: (p,ptr) p: Predicate used as a search key. ptr: Pointer to another tree node.

Properties Every node contain between kM and M unless it is the root. For each index entry (p,ptr) in a leaf node, p holds for the tuple For each index entry (p,ptr) in a non-leaf node, p is true when instantiated with the values of any tuple reachable from ptr. All leaves appear on the same level.

Note on Properties …. (p,ptr) ….. …. (p’,ptr’) ….. …. (p1,ptr1) …..…. (p2,ptr2) p holds for p1,p2 p’ holds for p1,p2 p’  p Not Required The ability of orthogonal classification.. Recall R-Tree

GiST Methods Key Methods: the methods the user can specify to configure the GiST. The methods encapsulate the structure and behavior of the object class used for keys in the tree. Tree Methods: Provided by the GiST, and may invoke the required key methods.

Key Methods Consistent(E,q): False if p^q guaranteed unsatisfiable, true otherwise. Union(P): returns predicate r that holds for all predicates in P Compress(E): returns (p’,ptr). Decompress(E): returns (r,ptr) where p  r. This a lossy compression as we do not require p  r E is an entry of the form (p,ptr), q is a query, P a set of entries

Key Methods (Cont.) Penalty(E1,E2): returns domain specific penalty for inserting E2 into the subtree rooted at E1. Typically the penalty metric is representation of the increase of size from E1.p to Union(E1,E2). PickSplit(P): M+1 entries, splits P into two sets of entries P1,P2, each of the size kM. The choice of the minimum fill factor is controlled here.

Tree Methods Search: Controlled by the Consistent Method. Insert: Controlled by the Penalty and PickSplit. Delete: Controlled by the Consistent

Example New (q,ptr) Penalty = mPenalty = n m < n Penalty =iPenalty = j j < i Full.. Then split according to PickSplit (p,ptr) R (q,ptr) (p,ptr) New (q,ptr)

Applications GiST Over Z (B+ Trees) GiST Over Polygons in R 2 (R Trees)

B+ Trees Using GiST p here is on the form Contains([x p,y p ),v) Consistent(E,q) returns true if If q= Contains([x q,y q ),v): (x p x q ) If q= Equal (x q,v): x p  x q <y p Union(P) returns [Min(x1,x2,…,xn),MAX(y1,y2,….,yn)).

B+ Trees Using GiST (Cont.) Penalty(E,F) If E is the leftmost pointer on its node, returns MAX(y2-y1,0) If E is the rightmost pointer on its node, returns MAX(x1-x2,0) Otherwise, returns MAX(y2-y1,0)+MAX(x1-x2,0) PickSplit(P) let the first entries in order to go to the left node and the remaining in the right node.

B+ Trees Using GiST (Cont.) Compress(E) if E is the leftmost key on a non-leaf node return 0 bytes otherwise, returns E.p.x Decompress(E) if E is the leftmost key on a non-leaf node let x= -  otherwise let x=E.p.x If E is the rightmost key on a non-leaf node let y= . If E is other entry in a non-leaf node, let y = the value stored in the next key. Otherwise, let y = x+1

R - Trees Using GiST The key here is in the form (x ul,y ul,x lr,y lr ) Query predicates are: Contains ((x ul1,y ul1,x lr1,y lr1 ), (x ul2,y ul2,x lr2,y lr2 )) Returns true if (x ul1  x ul2 ) ^( y ul1  y ul2 ) ^ ( x lr1  x lr2 ) ^ ( y lr1  y lr2 ) Overlaps ((x ul1,y ul1,x lr1,y lr1 ), (x ul2,y ul2,x lr2,y lr2 )) Returns true if (x ul1  x lr2 ) ^( y ul1  y lr2 ) ^ ( x ul2  x lr1 ) ^ ( y lr1  y ul2 ) Equal ((x ul1,y ul1,x lr1,y lr1 ), (x ul2,y ul2,x lr2,y lr2 )) Returns true if (x ul1 = x ul2 ) ^( y ul1 = y ul2 ) ^ ( x lr1 = x lr2 ) ^ ( y lr1 = y lr2 )

R – Trees Using GiST(Cont.) Consistent(E,q) p contains (x ul1,y ul1,x lr1,y lr1 ), and q is either Contains, Overlap or Equal (x ul2,y ul2,x lr2,y lr2 ) Returns true if Overlaps ((x ul1,y ul1,x lr1,y lr1 ), (x ul2,y ul2,x lr2,y lr2 )) Union(P) returns coordinates of the maximum bounding rectangles of all rectangles in P.

R – Trees Using GiST (Cont.) Penalty(E,F) Compute q= Union(E,F) and return area(q) – area(E.p) PickSplit(P) Variety of algorithms are provided to best split the entries in a over-full node.

R – Trees Using GiST (Cont.) Compress(E) Form the bounding rectangle of E.p Decompress(E) The identity function