1 CSE 326: Data Structures Trees
2 Today: Splay Trees Fast both in worst-case amortized analysis and in practice Are used in the kernel of NT for keep track of process information! Invented by Sleator and Tarjan (1985) Details: Weiss 4.5 (basic splay trees) 11.5 (amortized analysis) 12.1 (better “top down” implementation)
3 Basic Idea “Blind” rebalancing – no height info kept! Worst-case time per operation is O(n) Worst-case amortized time is O(log n) Insert/find always rotates node to the root! Good locality: –Most commonly accessed keys move high in tree – become easier and easier to find
4 Idea You’re forced to make a really deep access: Since you’re down there anyway, fix up a lot of deep nodes! move n to root by series of zig-zag and zig-zig rotations, followed by a final single rotation (zig) if necessary
5 Zig-Zag* g X p Y n Z W * This is just a double rotation n Y g W p ZX Helped Unchanged Hurt up 2 down 1 up 1down 1
6 Zig-Zig n Z Y p X g W g W X p Y n Z
7 Why Splaying Helps Node n and its children are always helped (raised) Except for last step, nodes that are hurt by a zig- zag or zig-zig are later helped by a rotation higher up the tree! Result: –shallow nodes may increase depth by one or two –helped nodes decrease depth by a large amount If a node n on the access path is at depth d before the splay, it’s at about depth d/2 after the splay –Exceptions are the root, the child of the root, and the node splayed
8 Splaying Example Find(6) zig-zig
9 Still Splaying 6 zig-zig
10 Almost There, Stay on Target zig
11 Splay Again Find(4) zig-zag
12 Example Splayed Out zig-zag
13 Locality “Locality” – if an item is accessed, it is likely to be accessed again soon –Why? Assume m n access in a tree of size n –Total worst case time is O(m log n) –O(log n) per access amortized time Suppose only k distinct items are accessed in the m accesses. –Time is O(n log n + m log k ) –Compare with O( m log n ) for AVL tree getting those k items near root those k items are all at the top of the tree
14 Splay Operations: Insert To insert, could do an ordinary BST insert –but would not fix up tree –A BST insert followed by a find (splay)? Better idea: do the splay before the insert! How?
15 Split Split(T, x) creates two BST’s L and R: –All elements of T are in either L or R –All elements in L are x –All elements in R are x –L and R share no elements Then how do we do the insert?
16 Split Split(T, x) creates two BST’s L and R: –All elements of T are in either L or R –All elements in L are x –All elements in R are > x –L and R share no elements Then how do we do the insert? Insert as root, with children L and R
17 Splitting in Splay Trees How can we split? –We have the splay operation –We can find x or the parent of where x would be if we were to insert it as an ordinary BST –We can splay x or the parent to the root –Then break one of the links from the root to a child
18 Split split(x) TLR splay OR LRLR x > x < x could be x, or what would have been the parent of x if root is x if root is > x
19 Back to Insert split(x) LR x LR > x x Insert(x): Split on x Join subtrees using x as root
20 Insert Example Insert(5) split(5)
21 Splay Operations: Delete find(x) LR x LR > x< x delete x Now what?
22 Join Join(L, R): given two trees such that L < R, merge them Splay on the maximum element in L then attach R LR R splay L
23 Delete Completed T find(x) LR x LR > x< x delete x T - x Join(L,R)
24 Delete Example Delete(4) find(4) Find max
25 Splay Trees, Summary Splay trees are arguably the most practical kind of self-balancing trees If number of finds is much larger than n, then locality is crucial! –Example: word-counting Also supports efficient Split and Join operations – useful for other tasks –E.g., range queries
26 Dictionary & Search ADTs Dictionary ADT (aka map ADT) Stores values associated with user-specified keys –keys may be any (homogenous) comparable type –values may be any (homogenous) type Search ADT: (aka Set ADT) stores keys only
27 Dictionary & Search ADTs insert(kohlrabi, upscale tuber) find(kreplach) kreplach: tasty stuffed dough create : dictionary insert : dictionary key values dictionary find :dictionary key values delete : dictionary key dictionary create : dictionary insert : dictionary key values dictionary find :dictionary key values delete : dictionary key dictionary kim chispicy cabbage Kreplachtasty stuffed dough KiwiAustralian fruit
28 Dictionary Implementations Arrays: –Unsorted –Sorted Linked lists BST –Random –AVL –Splay
29 Dictionary Implementations ArraysListsBinary Search Trees unsortedsortedAVLsplay insertO(1)O(n)O(1)O(log n) amortized findO(n)O(log n)O(n)O(log n) amortized delete find + O(1) O(n)find + O(1)O(log n) amortized
30 The last dictionary we discuss: B-Trees Suppose we want to store the data on disk A disk access is a lot more expensive than one CPU operation Example –1,000,000 entries in the dictionary –An AVL tree requires log(1,000,000) 20 disk accesses – this is expensive Idea in B Trees: –Increase the fan-out, decrease the hight –Make 1 node = 1 block
31 All keys are stored at leaves Nonleaf nodes have guidance keys, to help the search Parameter d = the degree B-Trees Basics book uses the order M = 2d+1) Rules for Keys: The root is either a leaf, or has between 1 and 2d keys All other nodes (except the root) have between d and 2d keys Rules for Keys: The root is either a leaf, or has between 1 and 2d keys All other nodes (except the root) have between d and 2d keys Rule for number of children: Each node (except leaves) has one more children than keys Rule for number of children: Each node (except leaves) has one more children than keys Balance rule: The tree is perfectly balanced ! Balance rule: The tree is perfectly balanced !
32 A non-leaf node: A leaf node: B-Trees Basics <=k<120120<=k<240 Keys 240<=k Record with key 40 Record with key 50Record with key 60 Next leaf Keys k < 30 Then called a B+ tree
33 B+Tree Example d = 2 (M = 5) Find the key < 40 < 40 40
34 B+Tree Design How large d ? Example: –Key size = 4 bytes –Pointer size = 8 bytes –Block size = 4096 byes 2d x 4 + (2d+1) 8 <= 4096 d = 170
B+ Trees Depth Assume d = 170 How deep is the B-tree ? Depth = 0 (just the root) at least 170 keys Depth = 1 at least 171 30 10 3 keys Depth = 2 5 10 6 keys Depth = 3 860 10 6 keys Depth = 4 147 10 9 keys Nobody has more keys ! With a B tree we can find any data item with at most 5 disk accesses !
36 Insertion in a B+ Tree Insert (K, P) Find leaf where K belongs, insert If no overflow (2d keys or less), halt If overflow (2d+1 keys), split node, insert in parent: If leaf, keep K3 too in right node When root splits, new root has 1 key only K1K2K3K4K5 P0P1P2P3P4p5 K1K2 P0P1P2 K4K5 P3P4p5 parent K3 parent