1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 19: B-Trees: Data Structures for Disk
2 Data Structures & Memory So far we’ve seen data structures stored in main memory What happens when you have a very large set of data? –Too slow to load it into memory –Might not fit into memory If you use virtual memory, the paging behavior changes the running time expectations A very large array of length N that is stored in virtual memory will take much longer to access if most of data is on pages that are paged out.
3 Data Structures on Disk For very large sets of information, we often need to keep most of it on disk Examples: –Information retrieval systems –Database systems To handle this efficiently: –Keep an index in memory –Keep the data on disk –The index contains pointers to the data on the disk Two most common techniques: –Hash tables and B-trees
4 Disk vs. RAM Disk has much larger capacity than RAM Disk is much slower to access than RAM
5 Images copyright 2003 Pearson Education RAM: Memory cells arranged by address
6 Images copyright 2003 Pearson Education Disk: Memory cells arranged by address
7 Disk Structure and Operation Made up of platters –Like a phonograph record Divided into –tracks (rings) and –sectors (wedges) Sectors divided into fixed-sized blocks As the disk spins beneath it, the read/write arm reads the data from the block(s) of interest –Data is read into a memory buffer in the OS –This gets eventually transferred to RAM –The process has to wait to get this resource
8 Disk Access Time Seek Time –The time required to move the read/write heads over the disk surface to the required track. –Roughly proportional to the distance the heads must move. Rotational Latency –The time taken, after the completion of the seek, for the disk platter to spin until the first sector addressed passes under the read/write heads. –On average, this is half of a full rotation. Transfer Time –The time taken for the disk platter to spin until all the addressed sectors have passed under the heads. –Directly proportional to the number of sectors addressed. Image and text from
9 Hash Tables vs. B-Trees Hash tables great for selecting individual items –Fast search and insert –O(1) if the table size and hash function are well-chosen BUT –Hash tables inefficient for finding sets of information with similar keys or for doing range searches (e.g., All documents published in a date range) –We often need this in text and DBMS applications Search trees are better for this –Can place subtrees near each other on disk B-Trees are a popular kind of disk-based search tree
10 B-Trees Goal: efficient access to sorted information Balanced Structure Sorted Keys Each node has many children Each node contains many data items –These are stored in an array in sorted order B-Tree is defined in terms of rules: –Makes use of a notion of a constant MINIMUM –These rules can vary We’ll use the ones in the Main book
11 B-Tree Rules (from Main) Rule 1: –The root may have as few as 1 data item –Every other node has at least MINIMUM items Rule 2: –The maximum number of elements in a node is twice the value of MINIMUM Rule 3: –The elements of each B-tree node are stored in a partially filled array –Sorted from smallest (item 0) to largest
12 B-Tree Rules (from Main) Rule 4: –The number of subtrees (children) of a non-leaf node is always one more than the number of items stored in the node. Rule 5: –For any non-leaf node: (a) A key at index i is greater than all the keys in subtree number i for a given node. (b) A key at index i is less than all the keys in subtree number i+1 for a given node. Rule 6: –Every leaf in a B-tree has the same depth
13 Illustration of Rules 4 and 5 93 and 107 Subtree 0 Subtree 1 Subtree 2 all keys < <= keys <= < all keys Note: we could use some other ordering here besides integers
14 Example B-tree 6 6 MINIMUM = 1 Does this meet all the rule conditions? 2 and and 8
15 Implementing B-Trees Requires recursive thinking –Every child of the root node is also the root of a smaller B-tree
16 Example B-tree 6 6 Each subtree recursively acts like a B-tree. 2 and and 22 6 and
17 Implementing B-Trees Requires recursive thinking –Every child of the root node is also the root of a smaller b-tree Defining the class –In this definition, data is also used as the keys –Static Variables: private static final int MINIMUM = 200; private static final int MAXIMUM + MINIMUM*2; –Instance Variables: int dataCount; int [] data = new int[MAXIMUM + 1]; int childCount; IntBTree[] node = new IntBTree[MAXIMUM + 2]; (extra room here to help with the implementation of add node)
18 Searching for an Item boolean contains (int target, IntBTree node) set i equal to the first index in node such that data[i] >= target if (target found at data[i]) return true else if (node has no children) return false else return node[i].contains(target)
19 Find 18? and and 22 6 and boolean contains (int target, IntBTree node) set i equal to the first index in node such that data[i] >= target if (target found at data[i]) return true else if (node has no children) return false else return node[i].contains(target)
20 Adding a Node Tricky because of the need to maintain the B-Tree rules The strategy: –First place the new item wherever it belongs, according to the value of the key –Then if a node has too many items, recursively split the too-large node until the B-Tree condition is recovered.
21 Add 19. First, place the item where it belongs numerically. (MAXIMUM=2) , 22 6, , 17 18, 19, 22
22 Now Propagate extra item up a level to restore the B-Tree condition , 19, 22 6, , 17, Requires a node split
23 Propagate again, recursively , 17,
24 In the middle of add node, need to split a too- large node, passing the extra up to the parent. (MINIMUM=2, MAXIMUM=4) 6 6 1, 2 3, 6 7, 8 13, 16, 19, 22, 25 34, 35 50, 51 33, 40 9, 28 14, 15 31, 32 4, 5 17, 18 20, 21 23, 24 26, 27 11, 12
25 In the middle of add node, need to split a too- large node, passing the extra up to the parent. (MINIMUM=2, MAXIMUM=4) 6 6 1, 2 3, 6 7, 8 34, 35 50, 51 33, 40 9, 28 14, 15 31, 32 4, 5 17, 18 20, 21 23, 24 26, 27 11, 12 13, 16 22, 25 19
26 In the middle of add node, need to split a too- large node, passing the extra up to the parent. (MINIMUM=2, MAXIMUM=4) 6 6 1, 2 3, 6 7, 8 34, 35 50, 51 33, 40 9, 19, 28 14, 15 31, 32 4, 5 17, 18 20, 21 23, 24 26, 27 11, 12 13, 16 22, 25
27 B-Tree Running Time Analysis Worst case time for: –Searching for an item? O(d) –Adding an item? O(d) But this is in terms of d, not n (number of nodes) What about n? –Depth of the B-tree is never more than O(log n) But what if the B-tree has very wide nodes? –There is a tradeoff; we’ll see this soon for B+trees
28 Slide adapted from cis.stvincent.edu B+Trees Differences from B-Tree –Assume the actual data is in a separate file on disk –Internal nodes store keys only Each node may contain many keys Designed to be “branchy” or “bushy” Designed to have shallow height Has a limit on the number of keys per node –This way only a small number of disk blocks need to be read to find the data of interest –Only leaves store data records The leaf nodes refer to memory locations on disk Each leaf is linked to an adjacent leaf
29 B+Tree and Disk Reads Goal: –Optimize the B+tree structure so that a minimum number of disk blocks need to be read If the number of keys is not too large, keep all of the B+tree in memory Otherwise, –Keep the root and first levels of nodes in memory –Organize the tree so that each node fits within a disk block in order to reduce the number of disk reads
30 Slide adapted from lecture by Hector Garcia-Molina B+tree rulestree of order s (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer”
31 Slide adapted from lecture by Hector Garcia-Molina B+Tree Sizes Size of nodes: s keys s+1 pointers Don’t want nodes to be too empty Use at least: Non-leaf:(s+1)/2 pointers Leaf:(s+1)/2 pointers to data
32 Slide adapted from lecture by Hector Garcia-Molina Root B+Tree Examples=
33 Slide adapted from lecture by Hector Garcia-Molina Sample non-leaf to keys to keysto keys to keys < k<8181 k<95
34 Slide adapted from lecture by Hector Garcia-Molina Sample leaf node: From non-leaf node to next leaf in sequence To record with key 47 To record with key 50 To record with key 51
35 Slide adapted from lecture by Hector Garcia-Molina Full nodemin. node Non-leaf Leaf s=
36 Example Use of B+Trees Recall that an inverted index is composed of –A Dictionary file and –A Postings file Use a B+Tree for the dictionary –The keys are the words –The values stored on disk are the postings
37 Using B+Trees for Inverted Index Use it to store the dictionary More efficient for searches on words with the same prefix –count* matches count, counter, counts, countess –Can store the postings for these terms near one another –Then only one disk seek is needed to get to these
38 Inverted Index Dictionary Postings
39 Slide adapted from lecture by Hector Garcia-Molina Insert into B+tree (a) simple case –space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root
40 Slide adapted from lecture by Hector Garcia-Molina (a) Insert key = 32 s=
41 Slide adapted from lecture by Hector Garcia-Molina (a) Insert key = 7 s=
42 Slide adapted from lecture by Hector Garcia-Molina (c) Insert key = 160 s=
43 Slide adapted from lecture by Hector Garcia-Molina (d) New root, insert 45 s= new root
44 Slide adapted from lecture by Hector Garcia-Molina Interesting problem: For B+tree, how large should s be? … n is number of keys / node
45 What is the expected running time for finding an item in the B+tree? Assume B+tree with nodes of size s Assume all of the index is in memory –Use binary search to locate the appropriate key within a node This takes a + b log 2 s for constants a and b Remember that s is a constant Assume the B+tree is full – # nodes to examine is log s n where n = # records If n dominates –(Meaning that the tree is deeper than the nodes are wide) –O(log s n) If s dominates –(Meaning the nodes are wider than the tree is deep) –O(log 2 s)
46 Slide adapted from lecture by Hector Garcia-Molina Sample assumptions: (order s B+tree) (1) Time to read node from disk is ( s) msec. (2) Once block in memory, use binary search to locate key: (a + b log 2 s) msec. For some constants a,b; Assume a << 70 (3) Assume B+tree is full, i.e., # nodes to examine is log s n where n = # records
47 Slide adapted from lecture by Hector Garcia-Molina Can get: f(s) = time to find a record f(s) s opt s Thus if s is too big or too small, problems result
48 Slide adapted from lecture by Hector Garcia-Molina FIND s opt by f’(s) = 0 Answer is s opt = “few hundred” What happens to s opt as Disk gets faster? CPU gets faster?
49 Slide adapted from lecture by Hector Garcia-Molina Tradeoffs: B-trees have faster lookup than B+trees But B+Tree is faster lookup if using fixed-sized blocks In B-tree, deletion more complicated B+trees often preferred
50 Choosing Data Structures Name example applications best suited for –Hash Tables –B+Trees
51 Next Time Sorting Algorithms