CSE 373 Data Structures and Algorithms

Slides:



Advertisements
Similar presentations
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
Advertisements

Other time considerations Source: Simon Garrett Modifications by Evan Korth.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
(B+-Trees, that is) Steve Wolfman 2014W1
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Tirgul 6 B-Trees – Another kind of balanced trees.
Splay Trees and B-Trees
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
CSE 326 Killer Bee-Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001 Where was that root?
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
CompSci Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CSE 326: Data Structures Lecture #9 Big, Bad B-Trees Steve Wolfman Winter Quarter 2000.
COMP261 Lecture 23 B Trees.
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
B-Trees B-Trees.
Multiway Search Trees Data may not fit into main memory
B-Trees .
Btrees Deletion.
B-Trees B-Trees.
CSE 332 Data Abstractions B-Trees
B+-Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Btrees Insertion.
B+-Trees.
B+-Trees.
B+ Tree.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Trees 4 The B-Tree Section 4.7
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
BTrees.
Data Structures and Algorithms
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
CSE 326: Data Structures Lecture #9 AVL II
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
B-Tree.
B+-Trees (Part 1).
Other time considerations
Multiway Trees Searching and B-Trees Advanced Tree Structures
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CSIT 402 Data Structures II With thanks to TK Prasad
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B- Trees D. Frey with apologies to Tom Anastasio
Solution for Section Worksheet 4, #7b & #7c
CSE 373: Data Structures and Algorithms
Lecture 13: Computer Memory
CSE 373: Data Structures and Algorithms
B-Trees.
B-Trees.
CSE 326: Data Structures Lecture #10 B-Trees
Presentation transcript:

CSE 373 Data Structures and Algorithms Lecture 25: B-Trees

Assumptions of Algorithm Analysis Big-Oh assumes all operations take the same amount of time Is this really true?

Disk Based Data Structures All data structures we have examined are limited to main memory Have been assuming data fits into main memory Counter-example: Transaction data of a bank > 1 GB per day Uses secondary storage (e.g., hard disks) Operations: insert, delete, searches

Cycles* to access: CPU Registers 1 Cache tens Main memory hundreds Disk millions *cycle: time it takes to execute an instruction

Hard Disks Large amount of storage but slow access Identifying a particular block takes a long time Read or write data in chunks (“a page”) of 0.5 – 8 KB in size (Newer technology) Solid- state drives are 50 – 100 times faster Still “slow”

Algorithm Analysis Running time of disk-based data structures measured in terms of Computing time (CPU) Number of disk accesses Regular main-memory algorithms that work one data element at a time can not be "ported" to secondary storage in a straight forward way

Principles Almost all of our data structure is on disk. Every time we access a node in the tree it amounts to a random disk access. How can we address this problem?

M-ary Search Tree Suppose we devised a search tree with branching factor M: M – 1 keys needed to decide branch to take Complete tree has height: # Nodes accessed for search: So, we’ll try to solve this problem as we did with heaps. Here’s the general idea. We create a search tree with a branching factor of M. Each node has M-1 keys and we search between them. What’s the runtime? O(logMn)? That’s a nice thought, and it’s the best case. What about the worst case? (logMn) (logMn)

B-Trees Internal nodes store (up to) M  1 keys Order property: Subtree between two keys x and y contain leaves with values v such that x  v < y Note the “” Leaf nodes contain up to L sorted values/ data items (“records”). 3 7 12 21 x<3 3x<7 7x<12 12x<21 21x M = 7 To address these problems, we’ll use a slightly more structured M-ary tree: B-Trees. As before, each internal node has M-1 keys. To manage memory problems, we’ll tune the size of a node (or leaf) to the size of a memory unit. Usually, a page or disk block.

B-Tree Structure Properties Root (special case) Has between 2 and M children (or could be a leaf) Internal nodes Stores up to M1 keys Have between ceiling(M/2) and M children Leaf nodes Where data is stored Contains between ceiling(L/2) and L data items Nodes are at least ½ full The properties of B-Trees (and the trees themselves) are a bit more complex than previous structures we’ve looked at. Here’s a big, gnarly list; we’ll go one step at a time. The maximum branching factor, as we said, is M (tunable for a given tree). The root has between 2 and M children of at most L keys. (L is another parameter) These restrictions will be different for the root than for other nodes. Leaves are at least ½ full The tree is perfectly balanced !

B-Tree: Example B-Tree with M = 4 (# pointers in internal node) and L = 5 (# data items in leaf) Data objects… which we’ll ignore in slides 12 44 6 20 27 34 50 1, AB.. 6 8 9 10 12 20 27 34 44 47 49 50 This is just an example B-tree. Notice that it has 24 entries with a depth of only 2. A BST would be 4 deep. Notice also that the leaves are at the same level in the tree. I’ll use integers as both key and data, but we all know that that could as well be different data at the bottom, right? 2, GH.. 14 22 28 38 60 All leaves at the same depth 4, XY.. 16 24 32 39 70 17 41 19 Definition for later: “neighbor” is the next sibling to the left or right.

Disk Friendliness Many keys stored in a node Each node is one disk page/block. All brought to memory/cache in one disk access. Internal nodes contain only keys; only leaf nodes contain actual data What is limiting you from increasing the number of keys stored in each node? The page size

Exercise: Computing M and L Exercise: If disk block is 4000 bytes, key size is 20 bytes, pointer size is 4 bytes, and data/value size is 200 bytes, what should M (# of branches) and L (# of data items per leaf) be for our B-Tree? Solve for M: M - 1 keys + M pointers = 20M - 20 + 4M = 24M – 20 24M - 20 <= 4000 M = 167 Solve for L: L = 4000 / 200 L = 20

B-trees vs. AVL trees Suppose we have n = 109 (billion) data items: Depth of AVL Tree: log2 109 = 30 Depth of B-Tree with M = 256, L = 256: ~ log M / 2109 = log 128109 = 4.3 (M/2 because keys half-full) x = 2^30 1.44 log_phi x = 43 Log_128 x = 4.3 Wow!!! 14 14

Building a B-Tree with Insertions 3 3 3 Insert(3) Insert(18) Insert(14) 18 14 18 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? The empty B-Tree M = 3 L = 3

3 3 3 18 Insert(30) 14 14 14 30 18 18 30 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? M = 3 L = 3

18 18 18 32 3 18 3 18 3 18 32 Insert(32) Insert(36) 14 30 14 30 14 30 36 32 18 32 Insert(15) Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 3 18 32 14 30 36 M = 3 L = 3 15

18 32 18 32 3 18 32 3 18 32 Insert(16) 14 30 36 14 30 36 15 15 16 18 32 15 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 18 3 15 18 32 15 32 14 16 30 36

M = 3 L = 3 Insert(12,40,45,38) 18 18 15 32 15 32 40 3 15 18 32 3 15 18 32 40 14 16 30 36 12 16 30 36 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 14 38

Insertion Algorithm: The Overflow Step K0 K6 K0 K3 K6 K1 K2 K3 K4 K5 K1 K2 K4 K5 Too big M = 5

Insertion Algorithm Insert the key in its leaf in sorted order If the leaf ends up with L+1 items, overflow! Split the leaf into two nodes with each half of the data. Add the new leaf to the parent. If the parent ends up with M+1 children, overflow! OK, here’s that process as an algorithm. The new funky symbol is floor; that’s just like regular C++ integer division. Notice that this can propagate all the way up the tree. How often will it do that? Notice that the two new leaves or internal nodes are guaranteed to have enough items (or subtrees). Because even the floor of (L+1)/2 is as big as the ceiling of L/2.

Insertion Algorithm 3. If an internal node ends up with M+1 children, overflow! Split the internal node into two nodes each with half the keys. Add the new child to the parent If the parent ends up with M+1 items, overflow! If the root ends up with M+1 children, split it in two, and create new root with two children This makes the tree deeper! OK, here’s that process as an algorithm. The new funky symbol is floor; that’s just like regular C++ integer division. Notice that this can propagate all the way up the tree. How often will it do that? Notice that the two new leaves or internal nodes are guaranteed to have enough items (or subtrees). Because even the floor of (L+1)/2 is as big as the ceiling of L/2.

And Now for Deletion… 18 Delete(32) 18 15 32 40 15 40 3 15 18 32 40 3 36 40 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 12 16 30 36 45 12 16 30 38 45 14 38 14 M = 3 L = 3

Are you using that 14? Can I borrow it? M = 3 L = 3 18 Delete(15) 18 15 36 40 16 36 40 3 15 18 36 40 3 16 18 36 40 12 16 30 38 45 12 30 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 14 14 Are we okay? Are you using that 14? Can I borrow it? M = 3 L = 3 Leaf not half full!

18 18 16 36 40 14 36 40 3 16 18 36 40 3 14 18 36 40 12 30 38 45 12 16 30 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 14 M = 3 L = 3

18 Delete(16) 18 14 36 40 14 36 40 3 14 18 36 40 3 14 18 36 40 12 16 30 38 45 12 30 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? M = 3 L = 3 Are you using that 14?

18 18 14 36 40 36 40 3 14 18 36 40 3 18 36 40 12 30 38 45 12 30 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 14 M = 3 L = 3 Are you using the node18/30?

18 36 36 40 18 40 3 18 36 40 3 18 36 40 12 30 38 45 12 30 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 14 14 M = 3 L = 3

Delete(14) 36 36 18 40 18 40 3 18 36 40 3 18 36 40 12 30 38 45 12 30 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 14 M = 3 L = 3

Delete(18) 36 36 18 40 18 40 3 18 36 40 3 30 36 40 12 30 38 45 12 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? M = 3 L = 3

36 36 18 40 40 3 30 36 40 3 36 40 12 38 45 12 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 30 M = 3 L = 3

36 40 36 40 3 36 40 3 36 40 12 38 45 12 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 30 30 M = 3 L = 3

36 40 36 40 3 36 40 12 38 45 3 36 40 30 12 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 30 M = 3 L = 3

Deletion Algorithm: Rotation Step K0 K2 K0 K3 K1 K3 K4 K5 K1 K2 K4 K5 Too small This is left rotation. Similarly, right rotation M = 5

Deletion Algorithm: Merging Step Too small! K2 K1 K3 K4 K1 K2 K3 K4 Can’t get smaller Too small M = 5

Deletion Algorithm Remove the key from its leaf If the leaf ends up with fewer than L/2 items, underflow! Try a left rotation If not, try a right rotation If not, merge, then check the parent node for underflow Alright, that’s deletion. Let’s talk about a few of the details. Why will dumping keys always work? If the neighbors were too low on keys to loan any, they must have L/2 keys, but we have one fewer. Therefore, putting them together, we get at most L, and that’s legal.

Deletion Slide Two If an internal node ends up with fewer than M/2 children, underflow! Try a left rotation If not, try a right rotation If not, merge, then check the parent node for underflow If the root ends up with only one child, make the child the new root of the tree The same applies here for dumping subtrees as on the previous slide for dumping keys. This reduces the height of the tree!