1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.

Slides:



Advertisements
Similar presentations
Index tuning-- B+tree. overview Variation on B+tree: B-tree (no +) Idea: –Avoid duplicate keys –Have record pointers in non-leaf nodes.
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #7.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
CS4432: Database Systems II
CS CS4432: Database Systems II Basic indexing.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
1 More on Indexes Secondary Indexes B-Trees Source: our textbook, slides by Hector Garcia-Molina.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #10.
CS 277 – Spring 2002Notes 41 CS 277: Database System Implementation Notes 4: Indexing Arthur Keller.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #7.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B+ - Tree & B - Tree By Phi Thong Ho.
File Organizations March 2007R McFadyen ACS In SQL Server 2000 Tree terms root, internal, leaf, subtree parent, child, sibling balanced, unbalanced.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
B-Trees and B+-Trees Disk Storage What is a multiway tree?
CS 245Notes 41 CS 245: Database System Principles Notes 4: Indexing Hector Garcia-Molina.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
CS 255: Database System Principles slides: B-trees
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
CS4432: Database Systems II
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
B+ tree & B tree Extracted from Garcia Molina
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman.
CS 405G: Introduction to Database Systems 12. Index.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
CS 728 Advanced Database Systems Chapter 18
Azita Keshmiri CS 157B Ch 12 indexing and hashing
CS 245: Database System Principles Notes 4: Indexing
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
CS 245: Database System Principles Notes 4: Indexing
CPSC-629 Analysis of Algorithms
CPSC-310 Database Systems
(Slides by Hector Garcia-Molina,
CS 245: Database System Principles Notes 4: Indexing
B+Tree Example n=3 Root
Chapter 11 Indexing And Hashing (1)
Database Design and Programming
CPSC-608 Database Systems
CPSC-608 Database Systems
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

1 Query Processing Part 3: B+Trees

2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions expensive, and/or - Lose sequentiality & balance

3 ExampleIndex (sequential) continuous free space

4 ExampleIndex (sequential) continuous free space overflow area (not sequential)

5 B+Trees: Main Idea Give up on sequentiality of index Try to get “balance” A node corresponds to a block –Because have to read (at least) one block Control the height by requiring nodes to be at least half full

6 Root B+Tree Examplen=

7 Sample non-leaf to keysto keysto keys to keys < 5757  k<8181  k<95 

8 Sample leaf node From non-leaf node to next leaf in sequence To record with key 57 To record with key 81 To record with key 85

9 In textbook’s notationn=3 Leaf: Non-leaf:

10 Size of nodes:n+1 pointers n keys In leaf nodes, a pointer is associated with each key, and there is one additional pointer to the next leaf in sequential order In non-leaf nodes, the leftmost pointer has no associated value (fixed) Maximal Sizes of Nodes

11 Don’t want nodes to be too empty Use at least Non-leaf:  (n+1)/2  pointers Leaf:  (n+1)/2  pointers to data

12 Full nodemin. node Non-leaf Leaf n= counts even if null

13 B+Tree Rules (tree of order n) (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” Third rule on next slide

14 (3) Number of pointers/keys for B+tree Non-leaf (non-root) n+1n  (n+1)/2   (n+1)/2  - 1 Leaf (non-root) nn Rootn+1n21 Max Max Min Min ptrs keys ptrs  data keys  (n+1)/2  The sequence pointer at the bottom level is not counted in the above table It is assumed that there are at least two nodes at the bottom level (or else, there is only a root, which may have no pointers at all if the file is empty)

15 Insertion into a B+Tree (a) simple case –space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

16 (a) Insert key = 32 n=

17 (a) Insert key = 32 n=

18 (a) Insert key = 7 n=

19 (a) Insert key = 7 n=

20 (a) Insert key = 7 n=

21 (c) Insert key = 160 n=

22 (c) Insert key = 160 n=

23 (c) Insert key = 160 n=

24 (c) Insert key = 160 n=

25 (d) New root, insert 45 n=

26 (d) New root, insert 45 n=

27 (d) New root, insert 45 n=

28 (d) New root, insert 45 n= new root

29 Insertion Made Simple View a node as a sorted list of pairs (k,p) –In a leaf, there is a pair (k,p) for each key and pointer associated with the same record Ignore the sequence pointer to the next node –In a non-leaf node, (k,p) is created for each key and the pointer on its right side The meaning of (k,p) is that p points to the subtree whose smallest key is k

30 What is the Pair for the Leftmost Pointer? Create a virtual pair (k,p) for the leftmost pointer in each non-leaf node d –On the leftmost branch, k is -  –In other non-leaf nodes, k is taken from the last proper (i.e., non-virtual) pair that was used to reach d The virtual pairs are created when traversing the path from the root to a leaf –For insertion, no need to create the virtual pairs

31 Insertion Process We start by inserting a new pair (k,p) into a leaf If a node d becomes overfull –Create a new node d’ on the right side of d, and –Move half of the pairs to the new node If d’ is non-leaf, the key of its leftmost pair becomes virtual (i.e., is deleted after the next step) Create a pair (k’,p’ ), where k’ is the smallest key in d’ and p’ is a pointer to d’ Insert (k’,p’ ) into the parent of d, on the right side of the pair that points to d –(k’,p’ ) is never inserted in the leftmost position If the parent of d becomes overfull, continue recursively (may have to create a new root eventually)

32 (a) Simple case - no example (b) Merge with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Deletion from a B+tree

33 (b) Merging with sibling –Delete n=4

34 (b) Merging with sibling –Delete n=4 40

35 (c) Redistribute keys –Delete n=4

36 (c) Redistribute keys –Delete n=4 35

(d) Non-leaf merging –Delete 37 n=4 25

(d) Non-leaf merging –Delete 37 n=

(d) Non-leaf merging –Delete 37 n=

(d) Non-leaf merging –Delete 37 n= new root

41 B+Tree Deletions in Practice Often, merging is not implemented –Too hard and not worth it! Files usually grow in size over time

42 B+Trees as Search Keys A B+Tree can implement either a primary or a secondary search key If it is a secondary search key, then the leaves store the records of a one-level dense index, namely, –Pairs of a key and a pointer If it is a primary search key, the leaves can store either a one-level index or the file itself –In the second option, the records are longer than those stored in non-leaf nodes – is it a problem? Can the leaves store a sparse index? Answers on next slide

43 Suppose that a B+tree is a primary search key If the file is stored separately and is sorted on the primary key, then the bottom level of the B+tree is a one-level sparse index If the file is stored separately, but is not sorted (i.e., the file is a heap), then the bottom level must be a one-level dense index If the file is stored in the leaves of the B+tree, then the records in the leaves are longer than those in the non-leaf nodes –The parameter n (which determines the max and min number of records in the nodes) is different in the leaves (i.e., it is smaller than in the non-leaf nodes)

44 B+Trees vs. Dense & Sparse Indexes D & S indexes could be more efficient if –When creating or reorganizing, we leave enough space for future insertions But we should not leave too much space or else we’ll hurt performance –We reorganize before performance deteriorates too much But it is hard to know how much free space to leave and when to reorganize –Hence, B+trees are more efficient in practice

45 Buffering Requirements Since B+trees have no overflow blocks, they need to look at only one block from each level D & S indexes require larger and variable-size buffer space Also, only after reading a block, can we determine whether an overflow block also has to be read (and that overflow block is unlikely to be contiguous with the first block) –Render read-ahead buffering ineffective BTW, is LRU good for B+trees?

46 B Tree: A Variation on a B+Tree In a B Tree, some of the records (of the index or the file) are stored in non-leaf nodes –Why is it a good idea? Disadvantage of B trees –Two sizes of records in non-leaf nodes For fixed-sized blocks, fanout of non-leaf nodes is smaller; hence, tree is deeper and lookup takes longer –Deletion is more complicated –Hard to chain the records (of the index or the file)