B-Trees Text B-Tree Objects Building a B-Tree Read Weiss, §19.8

Slides:



Advertisements
Similar presentations
B-tree. Why B-Trees When the data is too big, we will have to use disk storage instead of putting all the data in main memory In such case, we have to.
Advertisements

CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CS 206 Introduction to Computer Science II 11 / 24 / 2008 Instructor: Michael Eckmann.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Starting at Binary Trees
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Indexing CS 400/600 – Data Structures. Indexing2 Memory and Disk  Typical memory access: 30 – 60 ns  Typical disk access: 3-9 ms  Difference: 100,000.
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
CIS 068 Welcome to CIS 068 ! Lesson 12: Data Structures 3 Trees.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Internal and External Sorting External Searching
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
COMP261 Lecture 23 B Trees.
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
B/B+ Trees 4.7.
Multiway Search Trees Data may not fit into main memory
B-Trees B-Trees.
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Binary Search Tree (BST)
B+-Trees.
B+-Trees.
B+-Trees.
B+ Tree.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
ITEC 2620M Introduction to Data Structures
Chapter Trees and B-Trees
Chapter Trees and B-Trees
Binary Trees, Binary Search Trees
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Wednesday, April 18, 2018 Announcements… For Today…
Data Structures and Algorithms
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
B-Tree.
B+-Trees (Part 1).
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B- Trees D. Frey with apologies to Tom Anastasio
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Richard Anderson Spring 2016
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B-Trees.
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Presentation transcript:

B-Trees Text B-Tree Objects Building a B-Tree Read Weiss, §19.8 Leaf, Interior Node, Information Packet, B-Tree Building a B-Tree Sequence of inserts Leaf splitting Node splitting Creation of root nodes Role of packet

Motivation Discussion and analysis of BST, AVL Trees, and Splay Trees assumed that trees were all contained in memory Unrealistic – databases typically larger than available memory Example: 10 million records ≈ 10M records = 10* 220 records each record has a total of 512 = 29 bytes/record total bytes required 10* 220 * 29 = 10 * 229 = 5Gbytes user is one of 20 on system  application has 1/20 of resources 1/20 of a 32-Gbyte memory ≈ 1.6 Gbytes < 5Gbytes

Motivation Differences between memory and disk speeds Example: Memory 25 MIPS machine = 25 million instructions/sec Disk concentric tracks 3600 rev/sec  one revolution in 1/60 sec=16.7 ms ignoring read/write head movement, ave access time = 8.3 ms more realistically, 9-11 ms access time 120 disk accesses/sec (typically gathers several clusters of data, where 1 cluster = multiple sectors of data)

Motivation Simple overview of memory and disk storage http://www.jegsworks.com/Lessons/lessonintro.htm click on “Storage” Short paper comparing 512-byte vs 4K-byte sectors http://www.usenix.org/publications/library/proceedings/fast02/wips/mccarthy.pdf

Motivation Number of accesses Assume N=10M  log2 N = log2 10M ≈ 24 BST (worst case) = O(N)  access 10M records BST (average case) = 1.38 log N accesses 1.38 * 24 ≈ 33 accesses AVL (worst case) = 1.44 log N accesses 1.44 * 24 ≈ 34 accesses AVL (average case) ≈ log N accesses ≈ 24 accesses

Motivation Access Time Assume each access takes total of 160 ms seek time + latency + data transmission time BST (worst case) 10M accesses = 10 * 220 * 160 ms = 19.4 days BST (average case) 33 accesses = 5.28 seconds AVL (worst case) 34 accesses = 5.44 seconds AVL (average case) 24 accesses = 3.84 seconds

Motivation Ideal Access Time Candidate Data Structure: B-Trees 24–33 accesses (3-6 seconds) still too long ideally 4-5 accesses (0.64 - 0.8 seconds) Candidate Data Structure: B-Trees

Definition A B-tree of order M is an M-ary tree such that: The data items are stored at the leaves The interior nodes store up to M−1 keys to guide the searching; key i represents the smallest key in subtree (i−1) The root is either a leaf or has between two and M children All interior nodes (except possibly the root) has between ┌ M/2┐ and M children All leaves are at the same depth and have between ┌ L/2┐ and L children

Contains records in sorted order Interior Node, M=3 Contains references to all children and smallest keys of all but one child subtree root B-Tree object ● 10 ● _ ● 17 Information Packet, this one contains a reference to a newly created Leaf and the smallest key in the leaf 5 7 __ 10 14 __ 17 24 __ Leaf, L=3 Contains records in sorted order

Choosing M Each node represents a disk block For example, let a block be 8K = 213 bytes An interior node can hold M−1 keys and M disk references Assume a key requires 32 bytes Assume a disk reference requires 4 bytes 32*(M−1) + 4*M ≤ 213 36*M − 32 ≤ 8192 M ≤ 228

Choosing L Each node represents a disk block Let a block be 8K = 213 bytes A leaf can hold L records Assume a record requires 256 bytes 256*L ≤ 213 28 * L ≤ 213 L ≤ 32

Example Step-Through Assume Order 3 B-Tree, i.e., L = M = 3 Each leaf can hold at most 3 records Each interior node can hold at most 2 keys and 3 references to children

Contains records in sorted order Cast of Objects Interior Node, M=3 Contains references to all children and smallest keys of all but one child subtree root B-Tree object ● 10 ● _ ● 17 Information Packet, this one contains a reference to a newly created Leaf and the smallest key in the leaf 5 7 __ 10 14 __ 17 24 __ Leaf, L=3 Contains records in sorted order

Start with an empty B-Tree root = null Start with an empty B-Tree

10 root = null Calling method passes to B-Tree insert() method a value to be inserted.

root 10 B-Tree object receives value, realizes tree is empty, and creates new empty Leaf.

Root sends value to Leaf, i.e., passes value to Leaf.insert() method 10 Root sends value to Leaf, i.e., passes value to Leaf.insert() method

Insert: 10 root 10 Leaf stores value.

Leaf maintains records in sorted order Insert: 5 root 5 5 10 Leaf maintains records in sorted order

Leaf maintains records in sorted order Insert: 7 root 7 5 7 10 Leaf maintains records in sorted order

Number of records in leaf greater than order (3) of B-Tree Insert: 14 root 14 5 7 10 14 Number of records in leaf greater than order (3) of B-Tree  Leaf must split

Old Leaf creates new Leaf; moves greater values into new Leaf Insert: 14 root 5 7 10 14 Old Leaf creates new Leaf; moves greater values into new Leaf

Insert: 14 root 10 5 7 10 14 Old Leaf creates an Information Packet to return to parent. What must Packet contain?

Insert: 14 10 root ● 10 ● _ ● 5 7 10 14 Root receives Packet and uses it to create a new Interior Node which becomes new root

Insert is complete and Packet is trashed. root ● 10 ● _ ● 5 7 10 14 Insert is complete and Packet is trashed.

Root sends value to interior node. Insert: 24 root 24 ● 10 ● _ ● 5 7 10 14 Root sends value to interior node.

Interior Node determines to which of its children to send value. Insert: 24 root ● 10 ● _ ● 24 5 7 10 14 Interior Node determines to which of its children to send value.

Leaf receives and stores value. Insert: 24 root ● 10 ● _ ● 5 7 10 14 24 Leaf receives and stores value.

Root sends value to interior node. Insert: 17 root 17 ● 10 ● _ ● 5 7 10 14 24 Root sends value to interior node.

Interior node determines to which of its children to send value. Insert: 17 root ● 10 ● _ ● 17 5 7 10 14 24 Interior node determines to which of its children to send value.

Leaf receives and stores value Insert: 17 root ● 10 ● _ ● 5 7 10 14 17 24 Leaf receives and stores value

Insert: 17 root ● 10 ● _ ● 5 7 10 14 17 24 Leaf recognizes that it is overfull. It creates a new Leaf and moves the larger values into the new Leaf.

Leaf also creates Information Packet to return to parent. Insert: 17 root ● 10 ● _ ● 17 5 7 10 14 17 24 Leaf also creates Information Packet to return to parent.

Insert: 17 root ● 10 ● 17 ● 17 5 7 10 14 17 24 Interior Node receives Packet and uses it to set link to newly created Leaf.

Insert is complete and packet is trashed. root ● 10 ● 17 ● 5 7 10 14 17 24 Insert is complete and packet is trashed.

Insert: 6 root 6 ● 10 ● 17 ● 5 7 10 14 17 24

Insert: 6 root ● 10 ● 17 ● 6 5 7 10 14 17 24

Insert: 6 root ● 10 ● 17 ● 5 6 7 10 14 17 24

Insert: 4 root 4 ● 10 ● 17 ● 5 6 7 10 14 17 24

Insert: 4 root ● 10 ● 17 ● 4 5 6 7 10 14 17 24

Leaf recognizes that it is overfull. Insert: 4 root ● 10 ● 17 ● 4 5 6 7 10 14 17 24 Leaf recognizes that it is overfull.

Leaf creates new Leaf and moves larger values to new Leaf. Insert: 4 root ● 10 ● 17 ● 4 5 6 7 10 14 17 24 Leaf creates new Leaf and moves larger values to new Leaf.

Insert: 4 root ● 10 ● 17 ● 4 5 6 7 10 14 17 24 Leaf also creates an Information Packet that contains information about the new Leaf. Packet will be sent to old Leaf’s parent. 6

Insert: 4 root ● 6 ● 10 ● 17 ● 6 4 5 6 7 10 14 17 24 Interior Node uses information in the Packet it receives to update its key information.

Interior Node realizes that it is overfull. Insert: 4 root ● 6 ● 10 ● 17 ● 4 5 6 7 10 14 17 24 Interior Node realizes that it is overfull.

Insert: 4 root ● 6 ● 10 ● ● 17 ● _● 4 5 6 7 10 14 17 24 Interior Node creates new Node and stores larger value and links into new Node.

Insert: 4 root 10 ● 6 ● _ ● ● 17 ● _● 4 5 6 7 10 14 17 24 Interior Node also creates an Information Packet containing information about the new Node.

Insert: 4 10 root ● 6 ● _ ● ● 17 ● _● 4 5 6 7 10 14 17 24 B-Tree object receives Packet which tells it that it should create a new root Node.

B-Tree creates new root Node and initializes its links. 10 Insert: 4 ● 10 ● _● root ● 6 ● _ ● ● 17 ● _● 4 5 6 7 10 14 17 24 B-Tree creates new root Node and initializes its links.

Insert is complete and Packet is trashed. ● 10 ● _● root ● 6 ● _ ● ● 17 ● _● 4 5 6 7 10 14 17 24 Insert is complete and Packet is trashed.

Assignment #2 B-Trees