Credit for some of the slides in this lecture goes to

Slides:



Advertisements
Similar presentations
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Advertisements

 Definition of B+ tree  How to create B+ tree  How to search for record  How to delete and insert a data.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Data Organization - B-trees. 11.2Database System Concepts A simple index Brighton A Downtown A Downtown A Mianus A Perry.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Chapter 8 File organization and Indices.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
1 B+ Trees. 2 Tree-Structured Indices v Tree-structured indexing techniques support both range searches and equality searches. v ISAM : static structure;
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Indexing.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
Data Organization - B-trees
Data Indexing Herbert A. Evans.
B-Trees B-Trees.
Multilevel Indexing and B+ Trees
Multilevel Indexing and B+ Trees
Indexing and hashing.
Tree-Structured Indexes: Introduction
CS 728 Advanced Database Systems Chapter 18
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Tree-Structured Indexes
COP Introduction to Database Structures
Tree Indices Chapter 11.
Extra: B+ Trees CS1: Java Programming Colorado State University
C. Faloutsos Indexing and Hashing – part I
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
File organization and Indexing
Chapter 11: Indexing and Hashing
B-Trees.
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B+-Trees and Static Hashing
Tree-Structured Indexes
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
Tree-Structured Indexes
Indexing and Hashing Basic Concepts Ordered Indices
Tree-Structured Indexes
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Tree-Structured Indexes
Adapted from Mike Franklin
B-Trees CSE 373 Data Structures CSE AU B-Trees.
Chapter 11 Indexing And Hashing (1)
Indexing 1.
CSE 373, Copyright S. Tanimoto, 2002 B-Trees -
Credit for some of the slides in this lecture goes to
General External Merge Sort
Tree-Structured Indexes
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B-Trees.
CS4433 Database Systems Indexing.
Tree-Structured Indexes
Chapter 11: Indexing and Hashing
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
Presentation transcript:

Credit for some of the slides in this lecture goes to www.db-book.com Indexing Sampath Jayarathna Cal Poly Pomona Credit for some of the slides in this lecture goes to www.db-book.com

Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes used to look up records in a file. An index file consists of records (called index entries) of the form Index files are typically much smaller than the original file Two basic kinds of indices: Ordered indices: search keys are stored in sorted order Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”. search-key pointer

SQL Syntax To create an index for a table There may be other options based on the DBMS CREATE INDEX Index_Name ON Table_Name (index_column_name, ….., ….) To see if the indexes exist on the table SHOW INDEXES FROM Table_Name

Index Evaluation Metrics Access types supported efficiently. E.g., records with a specified value in the attribute or records with an attribute value falling in a specified range of values. Access time Insertion time Deletion time Space overhead

Ordered Indices In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library. Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file. The search key of a primary index is usually but not necessarily the primary key. Index-sequential file: ordered sequential file with a primary index. Indexed Sequential Access Method (ISAM) Secondary index: an index whose search key specifies an order different from the sequential order of the file.

Ordered Indices Primary index Secondary index 30 13 5 18 14 16 35 43 Data Page Data Page Data Page Data Page Data Page Data Page Secondary index 30 13 5 18 14 16 35 43 Data Page Data Page Data Page Data Page Data Page Data Page

Secondary Indices Example Secondary index on salary field of instructor Index record points to a bucket that contains pointers to all the actual records with that particular search-key value. Secondary indices have to be dense Indices offer substantial benefits when searching for records. BUT: Updating indices imposes overhead on database modification -- when a file is modified, every index on the file must be updated, 7

Dense Index Files Dense index — Index record appears for every search-key value in the file. E.g. index on ID attribute of instructor relation

Dense Index Files (Cont.) Dense index on dept_name, with instructor file sorted on dept_name

Sparse Index Files Sparse Index: contains index records for only some search-key values. Applicable when records are sequentially ordered on search-key To locate a record with search-key value K we: Find index record with largest search-key value < K Search file sequentially starting at the record to which the index record points Compared to dense indices: Less space and less maintenance overhead for insertions and deletions. Generally slower than dense index for locating records.

Multilevel Index If primary index does not fit in memory, access becomes expensive. Solution: treat primary index kept on disk as a sequential file and construct a sparse index on it. outer index – a sparse index of primary index inner index – the primary index file If even outer index is too large to fit in main memory, yet another level of index can be created, and so on. Indices at all levels must be updated on insertion or deletion from the file.

Multilevel Index Single level Index upper index Multilevel Index 30 13 5 18 14 16 35 43 Data Page Data Page Data Page Data Page Data Page Data Page upper index Data Page 30 13 5 18 14 16 35 43 17 Multilevel Index lower index

Index Update: Deletion If deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the index also. Single-level index entry deletion: Dense indices – deletion of search-key is similar to file record deletion. Sparse indices – if an entry for the search key exists in the index, it is deleted by replacing the entry in the index with the next search-key value in the file (in search-key order). If the next search-key value already has an index entry, the entry is deleted instead of being replaced.

Index Update: Insertion Single-level index insertion: Perform a lookup using the search-key value appearing in the record to be inserted. if the search-key value does not appear in the index, insert it. Multilevel insertion and deletion: algorithms are simple extensions of the single-level algorithms

ISAM Limitations Problems with ISAM: What if the index itself is too large to fit entirely in RAM at the same time? performance degrades as file grows Periodic reorganization of entire file is required. Insertion and deletion could be very expensive if all records after the inserted or deleted one have to shift up or down, crossing block boundaries.

B Trees B-tree indices are an alternative to indexed-sequential files B-tree is one of the most important data structures in computer science. What does B stand for? Several versions of B-trees have been proposed, including B+ trees. B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. automatically reorganizes itself with small, local, changes, in the face of insertions and deletions. Reorganization of entire file is not required to maintain performance. (Minor) disadvantage of B-trees: extra insertion and deletion overhead, space overhead. Advantages of B-trees outweigh disadvantages B-trees are used extensively

B-Tree Order The literature on B-trees is not uniform in its terminology. We’ll follow the following definition of a B-tree of order m : is a tree which satisfies the following properties: Key order Every node has at most m children. Every non-leaf node (except root) has at least ⌈m/2⌉ children. The root has at least two children if it is not a leaf node. A non-leaf node with k children contains k−1 keys. All leaves appear in the same level

Max Number of Children (or records) B-Tree Order The order, or branching factor, b of a B tree measures the capacity of nodes (i.e., the number of children nodes) for internal nodes in the tree Node Type Children Type Min Number of Children Max Number of Children (or records) Example  b=7 Example b=100 Root Node (when it is the only node in the tree) Records 1 b-1 1–6 1–99 Root Node Internal Nodes or Leaf Nodes 2 b 2–7 2–100 Internal Node Ceil(b/2) 4–7 50–100 Leaf Node Ceil(b/2) -1 3–6 49–99

Constructing a B-tree Suppose we start with an empty B-tree and keys arrive in the following order:1 12 8 2 25 5 14 28 17 7 52 16 48 68 3 26 29 53 55 45 We want to construct a B-tree of order 5 (5-1 keys in each node) The first four items go into the root: To put the fifth item in the root would violate the size of node Therefore, when 25 arrives, pick the middle key to make a new root 1 2 8 12

Constructing a B-tree (contd.) 8 1 2 12 25 6, 14, 28 get added to the leaf nodes: 1 2 8 12 14 6 25 28

Constructing a B-tree (contd.) Adding 17 to the right leaf node would over-fill it, so we take the middle key, push it (to the root) and split the leaf 8 17 1 2 6 12 14 25 28 7, 52, 16, 48 get added to the leaf nodes 8 17 12 14 25 28 1 2 6 16 48 52 7

Constructing a B-tree (contd.) Adding 68 causes us to split the right most leaf, pushing 48 to the root, and adding 3 causes us to split the left most leaf, pushing 3 to the root; 26, 29, 53, 55 then go into the leaves 3 8 17 48 1 2 6 7 12 14 16 25 26 28 29 52 53 55 68 Adding 45 causes a split of 25 26 28 29 and pushing 28 to the root then causes the root to split

Constructing a B-tree (contd.) 17 3 8 28 48 1 2 6 7 12 14 16 25 26 29 45 52 53 55 68

Inserting into a B-Tree Attempt to insert the new key into a leaf If this would result in that leaf becoming too big, split the leaf into two, pushing the middle key to the leaf’s parent If this would result in the parent becoming too big, split the parent into two, pushing the middle key This strategy might have to be repeated all the way to the top If necessary, the root is split in two and the middle key is pushed to a new root, making the tree one level higher

Class Activity 9a Insert the following keys to a order 5 B-tree: 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56

Deletion in B-Tree Example with m = 5 12 2 3 8 13 27 The root has between 2 and m children. Each non-root internal node has between m/2 and m children. All external nodes are at the same level.

Insert 10 12 2 3 8 10 13 27 We find the location for 10 by following a path from the root using the stored key values to guide the search. The search falls out the tree at the 4th child of the 1st child of the root. The 1st child of the root has room for the new element, so we store it there.

   and     and  Insert 11 12 2 3 8 10 11 13 27 We fall out of the tree at the child to the right of key 10. But there is no more room in the left child of the root to hold 11. Therefore, we must split this node...  ⌊(m-1)/2⌋ to left node and ⌈(m-1)/2⌉ to right node. Ex: if order m=5, out of (2,3,8,10,11) keys, 2 keys to left (2,3) and 2 keys to right (10, 11) and middle key (8) to separator

Insert 11 (Continued) 8 12 2 3 10 11 13 27 The parent gets one new child. (If the parent become overflow, then it, too, will have to be split).

Remove 8 8 12 2 3 10 11 13 27 Removing 8 might force us to move another key up from one of the children. It could either be the 3 from the 1st child or the 10 from the second child. However, neither child has more than the minimum number of children (2), so the two nodes will have to be merged. Nothing moves up.

Remove 8 (Continued) 12 2 3 10 11 13 27

Remove 13 12 2 3 10 11 13 27 Removing 13 would cause the node containing it to become underflow. To fix this, we try to reassign one key from a sibling that has spares.

Remove 13 (Cont) 11 2 3 10 12 27 The 13 is replaced by the parent’s key 12. The parent’s key 12 is replaced by the spare key 11 from the left sibling. The sibling has one fewer element.

Remove 11 11 2 3 10 12 27 11 is in a non-leaf, so replace it by the value immediately preceding: 10. 10 is at leaf, and this node has spares, so just delete it there.

Remove 11 (Cont) 10 2 3 12 27

Remove 2 10 2 3 12 27 Although 2 is at leaf level, removing it leads to an underflow node. The node has no left sibling. It does have a right sibling, but that node is at its minimum occupancy already. Therefore, the node must be merged with its right sibling.

Remove 2 (Cont) 3 10 12 27 The result is illegal, because the root does not have at least 2 children. Therefore, we must remove the root, making its child the new root.

Remove 2 (Cont) 3 10 12 27 The new B-tree has only one node, the root.

Insert 49 3 10 12 27 Let’s put an element into this B-tree.

Insert 49 (Cont) 3 10 12 27 49 Adding this key make the node overfull, so it must be split into two. But this node was the root. So we must construct a new root, and make these its children.

Insert 49 (Cont) 12 3 10 27 49 The middle key (12) is moved up into the root. The result is a B-tree with one more level.

Deleting a B-Tree keys – leaf key Delete key: on underflow, may need to merge Delete a key at a leaf – no underflow (easy) Just delete the key Delete a leaf-key, underflow, and rich sibling (left sibling or right sibling ) Rich can give a key, without underflow. But, borrowing a key: only THROUGH the parent (parent key to delete node, and highest key of left sibling or lowest key of right sibling to parent) Delete a leaf-key, underflow and poor sibling Merge by pulling a key from the parent (exact reversal from insertion, ‘split and push up’ vs ‘merge and pull down’) Merge with left sibling or right sibling (if left does not exist) and pull the parent down to merged node If parent underflow: repeat recursively

Deleting a B-Tree keys – Non leaf key Each element in a non-leaf (internal node) acts as a separation value for two subtrees, therefore we need to find a replacement for separation. Note that the largest element in the left subtree is still less than the separator.  Delete a key at a non leaf Just delete the key and replace highest from left sub-tree leaf, if underflow, repeat recursively Deleting a non-leaf eventually deletes a leaf key

Class Activity 9b Given order 5 B-tree created by these data (last exercise): 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56 Add these further keys: 2, 6,12 Delete these keys: 4, 5, 7, 3, 14

B Trees vs B+ tree

B-Tree Index File Example B-tree (above) and B+-tree (below) on same data

Example of B+-Tree

B+-Tree before and after insertion of “Adams” B+-Tree Insertion B+-Tree before and after insertion of “Adams”

B+-Tree before and after insertion of “Lamport” B+-Tree Insertion B+-Tree before and after insertion of “Lamport”

Inserting 16*, 8* into Example B+ tree Root 13 17 24 30 2* 3* 5* 7* 8* 14* 15* 16* You overflow 2* 5* 7* 3* 17 24 30 13 8* One new child (leaf node) generated; must add one more pointer to its parent, thus one more key value as well. ⌊m/2⌋ keys to left node and ⌈m/2⌉ keys to right node when copy up. 10

Inserting 8* (cont.) Copy up the middle value (leaf split) 13 17 24 30 13 17 24 30 Entry to be inserted in parent node. 5 (Note that 5 is s copied up and continues to appear in the leaf.) 2* 3* 5* 7* 8* You overflow! 5 13 17 24 30 12

Insertion into B+ tree (cont.) Understand difference between copy-up and push-up (similar to B-tree promote) Observe how minimum occupancy is guaranteed in both leaf and index pg splits. 5 13 17 24 30 We split this node, redistribute entries evenly, and push up middle key.  appears once in the index. Contrast Entry to be inserted in parent node. this with a leaf split.) 5 24 30 17 13 (Note that 17 is pushed up and only 12

Example B+ Tree After Inserting 8* Root 17 5 13 24 30 2* 3* 5* 7* 8* 14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Notice that root was split, leading to increase in height. 13

Inserting a Data Entry into a B+ Tree: Summary Find correct leaf L. Put data entry onto L. If L has enough space, done! Else, must split L (into L and a new node L2) Redistribute entries evenly, put middle key in L2 copy up middle key. Insert index entry pointing to L2 into parent of L. This can happen recursively To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.) Splits “grow” tree; root split increases height. Tree growth: gets wider or one level taller at top. 6

Delete 19* and 20* Have we still forgot something? Root You underflow 17 5 13 24 30 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 22* 27* 29* You underflow 22* 24* Have we still forgot something? 13

Deleting 19* and 20* (cont.) Notice how 27 is copied up. Root 17 5 13 27 30 2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* Notice how 27 is copied up. But can we move it up? Now we want to delete 24 Underflow again! But can we redistribute this time? 15

Deleting 24* Merge with sibling! You underflow Observe the two leaf nodes are merged, and 27 is discarded from their parent, but … Observe `pull down’ of index entry (below). 30 22* 27* 29* 33* 34* 38* 39* New root 5 13 17 30 2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* 16

B-Tree Index Files (Cont.) Advantages of B-Tree indices: May use less tree nodes than a corresponding B+-Tree. Sometimes possible to find search-key value before reaching leaf node. Advantages of B+ tree Because B+ trees don't have data associated with interior nodes, more keys can fit on a page of memory. The leaf nodes of B+ trees are linked, so doing a full scan of all objects in a tree requires just one linear pass through all the leaf nodes. A B tree, on the other hand, would require a traversal of every level in the tree. Disadvantages of B-Tree indices: Only small fraction of all search-key values are found early