CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #7
Terms Sequential file (data file) Index file (key-pointer pairs) Search key Primary index Secondary index Dense index Sparse index Multi-level index 2
B+Trees Support fast search Support range search Support dynamic changes Could be either dense or sparse 3
Root B+Tree Examplen=
Sample non-leaf to keysto keysto keys to keys < 5757 k<8181 k<95
Sample leaf node: From non-leaf node to next leaf in sequence To record with key 57 To record with key 81 To record with key 85 6
Size of nodes:n+1 pointers n keys FIXED 7
Don’t want nodes to be too empty Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data 8
Full nodeMin. node Non-leaf Leaf n=
B+tree rulestree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” 10
(3) Number of pointers/keys for B+tree Non-leaf (non-root) n+1n (n+1)/ 2 (n+1)/ 2 – 1 Leaf (non-root) n+1n Rootn+1n21 Max Max Min Min ptrs keys ptrs data keys (n+ 1) / 2 11 could be 1
Search in a B+tree Start from the root Search in a leaf block May not have to go to the data file 12
Pseudo Code for Search in a B+tree Search(ptr, k); \\ search a record of key value k in the subtree rooted at *ptr Case 1. *ptr is a leaf IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. *ptr is not a leaf find a key ki in *ptr such that ki <= k < k(i+1); return(Search(pi, k); 13
Insert into B+tree (a) simple case –space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root 14
Pseudo Code for Insertion in a B+tree Insert(ptr, (k,p), (k',p')); \\ p is a pointer to a data record with key value k, which are inserted into the subtree rooted at \\ *ptr; p' is a pointer to a new brother of *ptr, if created, in which k' is the smallest key value; Case 1. *ptr is a leaf IF there is room in *ptr, THEN insert (k,p) into *ptr; return(k'=0, p'=Null); ELSE re-arrange the content in *ptr and (k,p) into (p0, k1, p1,..., k(n+1), p(n+1)); leave (p0, k1,..., k(r-1), p(r-1)) in *ptr; create a new leaf q; put (pr, k(r+1), p(r+1),..., k(n+1), p(n+1)) in *q; return( k'= k_r, p' = q ); Case 2. *ptr is not a leaf find a key ki in *ptr such that ki <= k < k(i+1); Insert(pi, (k,p), (k",p")); IF (p" = Null) THEN return(k'=0, p'=Null); ELSE IF there is room in *ptr, THEN insert (k",p") into *ptr; return(k'=0, p'=Null); ELSE re-arrange the content in *ptr and (k",p") into (p0, k1, p1,..., k(n+1), p(n+1)); leave (p0, k1,..., k(r-1), p(r-1)) in *ptr; create a new leaf q; put (pr, k(r+1), p(r+1),..., k(n+1), p(n+1)) in *q; return( k'= k_r, p' = q ); 15
(a) Insert key = 32 n=
(a) Insert key = 32 n=
(b) Insert key = 7 n=
(b) Insert key = 7 n=
(c) Insert key = 160 n=
(c) Insert key = 160 n=
(d) New root, insert 45 n=
(d) New root, insert 45 n= new root
(a) Simple case (no example) (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Deletion from B+tree 24
(b) Coalesce with sibling –Delete n=4 25
(b) Coalesce with sibling –Delete n=
(c) Redistribute keys –Delete n=4 27
(c) Redistribute keys –Delete n=
(d) Non-leaf coalesce –Delete 37 n= new root 29
B+tree deletions in practice –Often, coalescing is not implemented –Too hard and not worth it! 30
Interesting problem: For a B+tree, how large should n be? … n is number of keys per node 31
Sample assumptions: (1)Time to read node from disk is (S+Tn) ms (S, T: constants, e.g. S=70, T=0.5) (2) Once block in memory, use binary search: (a + b log 2 n) ms (a, b: constants, a « S) (3) Assume B+tree is full, i.e., nodes to examine is log n N, where N = # records 32 (4) Total search time: f(n) = (log n N+1)(S+Tn) + (a + b log 2 n)
øCan get: f(n) = time to find a record f(n) n opt n 33
FIND n opt by f’(n) = 0 Answer is n opt = “few hundred” 34
Variation on B+tree: B-tree (no +) Idea: –Avoid duplicate keys –Have record pointers in non-leaf nodes 35
to record to record to record with K1 with K2 with K3 to keys to keys to keys to keys k3 K1 P1K2 P2K3 P3 36
B-tree examplen=
B-tree examplen= sequence pointers not useful now! 38
Tradeoffs: B-trees have faster lookup than B+trees in B-tree, non-leaf & leaf different sizes in B-tree, insertion and deletion more complicated B+trees preferred! 39