E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees.

Slides:



Advertisements
Similar presentations
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Advertisements

CS4432: Database Systems II
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
E.G.M. Petrakissearching1 Searching  Find an element in a collection in the main memory or on the disk  collection: (K 1,I 1 ),(K 2,I 2 )…(K N,I N )
File Organizations March 2007R McFadyen ACS In SQL Server 2000 Tree terms root, internal, leaf, subtree parent, child, sibling balanced, unbalanced.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Preliminaries Multiway trees have nodes with greater than two children. Multiway trees of order k have nodes with most k children Trees –For all.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
CS4432: Database Systems II
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
B+ Trees COMP
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Starting at Binary Trees
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
File Organization and Processing Week Tree Tree.
Arboles B External Search The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Chapter 7 Trees_Part3 1 SEARCH TREE. Search Trees 2  Two standard search trees:  Binary Search Trees (non-balanced) All items in left sub-tree are less.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Internal and External Sorting External Searching
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
Multiway Search Trees Data may not fit into main memory
Tree-Structured Indexes: Introduction
Extra: B+ Trees CS1: Java Programming Colorado State University
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B- Trees D. Frey with apologies to Tom Anastasio
Multiway Search Tree (MST)
Presentation transcript:

E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees  S 1 S 2 …. S m, m  n  Each node has n-1 or fewer keys  K 1 Κ 2 …Κ m-1 : m-1 keys in ascending order K(S i )  Κ i  K(S i+1 ), K(S m-1 ) < K(S m )

E.G.M. PetrakisB-trees2 n=4 n=3

E.G.M. PetrakisB-trees3 MSTs for Disk  Nodes correspond to disk pages  Pros:  tree height is low for large n  fewer disk accesses  Cons:  low space utilization if non-full  MSTs are non-balanced in general!

E.G.M. PetrakisB-trees keys, n=5  At least 4000/(5-1) nodes (pages)  1 st level (root): 1 node, 4 keys, 5 sub-trees +  2 nd level: 5 nodes, 20 keys, 25 sub-trees +  3 rd level: 25 nodes, 100 keys, 125 sub-trees +  4 th level: 125 nodes, 500 keys,525 sub-trees +  5 th level: 525 nodes, 2100 keys, 2625 sub- tress +  6 th level: 2625 nodes, keys, ….  tree height = 6 (including root)  If n = 11 at least 400 nodes and tree height = 3

E.G.M. PetrakisB-trees5 Operations on MSTs  Search: returns pointer to node containing the key and position of key in the node  Insert new key if not already in the tree  Delete existing key

E.G.M. PetrakisB-trees6 Important Issues  Keep MST balanced after insertions or deletions  Balanced MSTs: B-trees, B+-trees  Reduce number of disk accesses  Data storage: two alternatives 1. inside nodes: less sub-trees, nodes 2. pointers from the nodes to data pages

E.G.M. PetrakisB-trees7 Search Algorithm MST * search (MST * tree, int n, keytype key, int * position) { MST * p = tree; while (p != null) {// search a node i = nodesearch(p, key); if (i < numtrees(p) – 1 && key == key(p, i)) { * position = i; return p; } p = son(p, i); } position = -1; return(-1); }

E.G.M. PetrakisB-trees8

E.G.M. PetrakisB-trees9 Insertions  Search & insert key it if not there a) the tree grows at the leafs => the tree becomes imbalanced => not equal number of disk accesses for every key b) the shape of the tree depends on the insertion sequence c) low storage utilization, many non-full nodes

E.G.M. PetrakisB-trees n = 4 max 3 keys/node insert:

E.G.M. PetrakisB-trees11 MST Deletions  Find and delete a key from a node  free the node if empty  If the key has right or/and left sub- trees => find its successor or predecessor  min value of right subtree or max value of left subtree  put this in the position of the key and delete the successor or predecessor  The tree becomes imbalanced

E.G.M. PetrakisB-trees n= delete 150,

E.G.M. PetrakisB-trees13 Definition of n-Order B-Tree  Every path from the root to a leaf has the same length h >= 0  Each node except the root and the leafs has at least sub-trees and keys  Each node has at most n sub-trees and n- 1 keys  The root has at least 1 node and 2 sub- trees or it is a leaf

E.G.M. PetrakisB-trees n= n= n=7

E.G.M. PetrakisB-trees15 B-Tree Insertions  Find leaf to insert new node  If not full, insert key in proper position else  Create a new node and split contents of old node  left and right node  the n/2 lower keys go into the left node  the n/2 larger keys go into the right node  the separator key goes up to the father node  If n is even: one more key in left (left bias) or right node (right bias)

E.G.M. PetrakisB-trees16 n=5

E.G.M. PetrakisB-trees17 n=4

E.G.M. PetrakisB-trees18 B-Tree Insertions (cont.)  If the father node is full: split it and proceed the same way, the new father may split again if it is full …  The splits may reach the root which may also split creating a new root  The changes are propagated from the leafs to the root  The tree remains balanced

E.G.M. PetrakisB-trees19 n=5

E.G.M. PetrakisB-trees20 Advantages  The tree remains balanced!!  Good storage utilization: about 50%  Low tree height, equal for all leafs => same number of disk accesses for all leafs  B-trees: MSTs with additional properties  special insertion & deletion algorithms

E.G.M. PetrakisB-trees21 B-Tree Deletions Two cases: a) Merging: If less than n/2 keys in left and right node then, sum of keys + separator key in father node < n  the two nodes are merged,  one node is freed b) Borrowing: if after deletion the node has less than n/2 keys and its left (or right) node has more than n/2 keys  find successor of separator key  put separator in the place of the deleted key  put the successor as separator key

E.G.M. PetrakisB-trees22 n = 4 n = 5

E.G.M. PetrakisB-trees23 n = 5

E.G.M. PetrakisB-trees24 Performance [log n (N+1) ]  h  [log (n+1)/2 ((N+1)/2) ] +1  Search: the cost increases with the log n N  Tree height ~ log n N disc accesses n N

E.G.M. PetrakisB-trees25 Performance (cont.)  Insertions/Deletions: Cost proportional to 2h  the changes propagate towards the root  Observation: The cost drops with n  larger size of track (page) but, in fact the cost increases with the amount of information which is transferred from the disk into the main memory

E.G.M. PetrakisB-trees26 Problem  B-trees are good for random accesses  E.g., search(30)  But not for range queries: require multiple tree traversals  search(20-30)

E.G.M. PetrakisB-trees27 B+-Trees  At most n sub-trees and n-1 keys  At least sub-trees and keys  Root: at least 2 sub-trees and 1 key  The keys can be repeated in non- leaf nodes  Only the leafs point to data pages  The leafs are linked together with pointers

E.G.M. PetrakisB-trees28 index data pages

E.G.M. PetrakisB-trees29 Performance  Better than B-trees for range searching  B-trees are better for random accesses  The search must reach a leaf before it is confirmed  internal keys may not correspond to actual record information (can be separator keys only)  insertions: leave middle key in father node  deletions: do not always delete key from internal node (if it is a separator key)

E.G.M. PetrakisB-trees30 Insertions  Find appropriate leaf node to insert key  If it is full:  allocate new node  split its contents and  insert separator key in father node  If the father is full  allocate new node and split the same way  continue upwards if necessary  if the root is split create new root with two sub-trees

E.G.M. PetrakisB-trees31 Deletions  Find and delete key from the leaf  If the leaf has < n/2 keys a)borrowing if its neighbor leaf has more than n/2 keys update father node (the separator key may change) or b)merging with neighbor if both have < n keys  causes deletion of separator in father node  update father node  Continue upwards if father node is not the root and has less than n/2 keys

E.G.M. PetrakisB-trees data pages data pages: min 2, max 3 records index pages: min 2, max 3 pointers B+ order: n = 4, n/2 =2 index data pages don’t store pointers as non-leaf nodes do but they are kinked together

E.G.M. PetrakisB-trees insert 55 with page splitting delete 84 with borrowing:

E.G.M. PetrakisB-trees34 delete 31 with merging free page

E.G.M. PetrakisB-trees35 B-trees and Variants  Many variants:  B-trees: Bayer and Mc Greight, 1971  B+-trees, B*-trees the most successful variants  The most successful data organization for secondary storage (based on primary key)  very good performance, comparable to hashing  fully dynamic behavior  good space utilization (69% on the average)

E.G.M. PetrakisB-trees36 B+/B-Trees Comparison  B-trees: no key repetition, better for random accesses (do not always reach a leaf), data pages on any node  B+-trees: key repetition, data page on leaf nodes only, better for range queries, easier implementation