IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.

Slides:



Advertisements
Similar presentations
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture20.
Advertisements

IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture27.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
IKI 10100I: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100I: Data.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
Chapter 9 of DBMS First we look at a simple (strawman) approach (ISAM). We will see why it is unsatisfactory. This will motivate the B+Tree Read 9.1 to.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
Preliminaries Multiway trees have nodes with greater than two children. Multiway trees of order k have nodes with most k children Trees –For all.
(B+-Trees, that is) Steve Wolfman 2014W1
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Ch12: Indexing and Hashing  Basic Concepts  Ordered Indices B+-Tree Index Files B+-Tree Index Files B-Tree Index Files B-Tree Index Files  Hashing Static.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
CS4432: Database Systems II
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
B+ Trees COMP
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
 … we have been assuming that the data collections we have been manipulating were entirely stored in memory.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Xiaoying Gao, Peter Andreae, VUW B Trees and B+ Trees COMP 261.
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
Indexing CS 400/600 – Data Structures. Indexing2 Memory and Disk  Typical memory access: 30 – 60 ns  Typical disk access: 3-9 ms  Difference: 100,000.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Internal and External Sorting External Searching
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 560: Database System Concepts Lecture 25 of 42 Monday, 31 March 2008 William.
Indexing Structures Database System Implementation CSE 507 Some slides adapted from R. Elmasri and S. Navathe, Fundamentals of Database Systems, Sixth.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
B+ Trees.
COMP261 Lecture 23 B Trees.
Multiway Search Trees Data may not fit into main memory
CS 728 Advanced Database Systems Chapter 18
Extra: B+ Trees CS1: Java Programming Colorado State University
B+-Trees.
B+-Trees.
B+ Tree.
B-Trees © Dave Bockus Acknowledgements to:
B+-Trees (Part 1).
CSE 373 Data Structures and Algorithms
Presentation transcript:

IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 B-Trees

2 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Outline Motivation B-Tree B+Tree Insertion Deletion

3 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Disks are slow! So far, data fits in RAM If data is too large, must store on Disk Big-Oh analysis changes (“constant time”?) Pentium (‘90s): 500 MIPS IDE HDD (’00s): 7200 RPM = 120 accesses per second 1 disk access = instructions Number of disk accesses dominates the running time! In reality, the numbers are even worse (multi-*) We are willing to do lots of CPU work to reduce disk access!

4 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Example Analyse the following case: You are asked to implement a database program that stores the data of all phonebook entries in Jakarta, e.g. there are entries. Each entry contains: name, address, telephone number, etc. Assume each entry fits in a record of 512 bytes. Total file size = * 512 byte = 2,4GB. Assume: CPU: 25 million instructions per second Disk: 6 accesses per second

5 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Disk blocks When storing data on disks, we use blocks: Secondary storage is divided into blocks of equal size. Common sizes: 512 bytes, 2 KB, 4 KB, 8 KB, etc. A block is the smallest unit transferred between disk & memory. Although a program reads 10 bytes, it must access 1 whole block.

6 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Using Binary Search Trees Unbalanced binary search is a disaster disk accesses = 9.6 days! Average BST: 1.38 log N = 30 accesses = 5 seconds Randomly, expect nodes 3x deeper = 15 seconds Red-black tree: worst case = 1.44 log N = 32 accesses = 5.3 seconds How can we do better than this?

7 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Make fatter trees! Complete binary tree = log 2 N nodes Complete M-ary tree = log M N nodes For N=31, best binary has height = 5, 5-ary tree has height = 3

8 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 B-Tree B-Tree of order M is an M-ary tree with following properties: Data items stored at leaves Non-leaf nodes store M-1 keys: key i represents smallest key in subtree i+1. Root is either leaf or has between 2 and M children. All non-leaf nodes (except root) have between  M/2  and M children. All leaves are at the same level <>=

9 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 B+Tree B+Tree is a variant of B-Tree: All key values (data) are stored at leaf nodes. Add pointer to connect leaf nodes as a linked- list. Enables sequential access of data without requiring tree traversal. Internal nodes contain keys, and are used as index.

10 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 B+Tree Leaf Nodes >=<

11 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 B+Tree Node Structure P K P K P K P 1122 n -1 n P K P K P K P 1122 n A high level node (internal node) A leaf node (Every key value appears in a leaf node) Pointer to subtree for keys>= K & < K Pointer to subtree for keys>= K 1 n- 2 n -1 Pointer to subtree for keys>= K & < K 12 Pointer to subtree for keys< K n -1 Pointer to record (block) with key K Pointer to record (block) with key K Pointer to leaf with smallest key greater than K Pointer to record (block) with key K 12 n -1 n-1n-1

12 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Example of a B+Tree Leaf Nodes Actual Data Records >=<

13 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Queries on B+Trees Find all records with a search-key value of k. 1. Start with the root node 1.Examine the node for the smallest search-key value > k. 2.If such a value exists, assume it is K j. Then follow P i to the child node 3.Otherwise k  K m–1, where there are m pointers in the node. Then follow P m to the child node. 2. If the node reached by following the pointer above is not a leaf node, repeat the above procedure on the node, and follow the corresponding pointer. 3. Eventually reach a leaf node. If for some i, key K i = k follow pointer P i to the desired record (or bucket). Else no record with search-key value k exists.

14 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Queries on B+Trees: Range Query Find all records with a search-key value > k and < l (range query). Find all records with a search-key value of k. while the next search-key value < l, follow the corresponding pointer to the records. when the current search-key is the last search-key in a node, follow the last pointer P n to the next leaf node.

15 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Insertion on B+Trees Find the leaf node in which the search-key value would appear If the search-key value is already there in the leaf node (non-unique search-key), record is added to data file, and if necessary search-key and the corresponding pointer is inserted into the leaf node

16 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Insertion on B+Trees If the search-key value is not there, then add the record to the data file: If there is room in the leaf node, insert (key-value, pointer) pair in the leaf node (should be sorted) Otherwise, split the node (along with the new (key-value, pointer) entry) as shown in the next slides. Splitting a node: Take the new (search-key value, pointer) pairs (including the one being inserted) in sorted order. Place the first  M/2  in the original node, and the rest in a new node. When splitting a leaf, promote the middle/median key in the parent of the node being split, but retain the copy in the leaf. When splitting an internal node, promote the middle/median key in the parent of the node being split, but DO NOT retain the copy in the leaf. If the parent is full, split it and propagate the split further up.

17 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Building a B+Tree B+Tree of order 4 (4-ary tree) Insert 67, 123, 89, 18, 34, 87, 99, 104, 36, 55, 78, 9 data records Root =leaf node The split at leaf nodes promote but retain a copy < data records root node >= split why promote 89, not 67?

18 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr , 123, 89, 18, 34, 87, 99, 104, 36, 55, 78, < root node >= 6787 < root node >= split Building a B+Tree

19 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr , 123, 89, 18, 34, 87, 99, 104, 36, 55, 78, < root node >= < root node split Building a B+Tree

20 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr , 123, 89, 18, 34, 87, 99, 104, 36, 55, 78, < < The split at non-leaf nodes promote & don’t retain a copy double node split split The splitting of nodes proceeds upwards till a node that isn’t full is found. In the worst case: root node may be split increasing the height of the tree by 1. Building a B+Tree

21 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Deletion on B+Trees Remove (search-key value, pointer) from the leaf node If the node has too few entries due to the removal (minimum requirement:  M/2  children), and the entries in the node and a sibling fit into a single node, then Merge the two nodes into a single node Delete the pair (K i–1, P i ), where P i is the pointer to the deleted node, from its parent, recursively using the above procedure.

22 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Deletion on B+Trees Otherwise, if the node has too few entries due to the removal, and the entries in the node and a sibling does not fit into a single node, then Redistribute the pointers between the node and a sibling such that both have more than the minimum number of entries. Update the corresponding search-key value in the parent of the node. The node deletions may cascade upwards till a node which has  M/2  or more pointers is found.

23 Ruli Manurung (Fasilkom UI)IKI10100: Lecture17 th Apr 2007 Summary B-Tree is mostly used as an external data structure for databases. B-Tree of degree m has the following properties: All non-leaf nodes (except the root which is not bound by a lower limit) have between  M/2  and M children. A non-leaf node that has n branches will contain n-1 keys. All leaves are at the same level, that is the same depth from the root. B+Tree is a variant from B-Tree where all key values are stored in leaves