Download presentation
Presentation is loading. Please wait.
CSE 326: Data Structures Trees
Lecture 8: Friday, Jan 24, 2003
Today: Splay Trees Fast both in worst-case amortized analysis and in practice Are used in the kernel of NT for keep track of process information! Invented by Sleator and Tarjan (1985) Details: Weiss 4.5 (basic splay trees) 11.5 (amortized analysis) 12.1 (better “top down” implementation) We’ll start by introducing AVL trees. Then, I’d like to spend some time talking about double-tailed distributions and means. Next, we’ll gind out what AVL stands for. Finally, you’ll receive a special bonus if we get to it! (Unfortunately, the bonus is AVL tree deletion)
Basic Idea “Blind” rebalancing – no height info kept!
Worst-case time per operation is O(n) Worst-case amortized time is O(log n) Insert/find always rotates node to the root! Good locality: Most commonly accessed keys move high in tree – become easier and easier to find
Idea move n to root by series of zig-zag and zig-zig rotations, followed by a final single rotation (zig) if necessary 10 You’re forced to make a really deep access: 17 Since you’re down there anyway, fix up a lot of deep nodes! 5 2 9 3
Zig-Zag* n n Helped Unchanged Hurt g X p g p W X Y Z W Y Z
up 2 X p g down 1 p down 1 up 1 n W This is just a double rotation. X Y Z W Y Z *This is just a double rotation
Zig-Zig n n g W p p Z X g Y Y Z W X
Can anyone tell me how to implement this with two rotations? There are two possibilities: Start with rotate n or rotate p? Rotate p! Rotate n makes p n’s left child and then we’re hosed. Then, rotate n. This helps all the nodes in blue and hurts the ones in red. So, in some sense, it helps and hurts the same number of nodes on one rotation. Question: what if we keep rotating? What happens to this whole subtree? It gets helped! Y Z W X
Why Splaying Helps Node n and its children are always helped (raised)
Except for last step, nodes that are hurt by a zig-zag or zig-zig are later helped by a rotation higher up the tree! Result: shallow nodes may increase depth by one or two helped nodes decrease depth by a large amount If a node n on the access path is at depth d before the splay, it’s at about depth d/2 after the splay Exceptions are the root, the child of the root, and the node splayed Alright, remember what we did on Monday. We learned how to splay a node to the root of a search tree. We decided it would help because we’d go a lot of fixing up if we had an expensive access. That means we have to fix up the tree on every expensive access.
Splaying Example 1 1 2 2 zig-zig 3 3 Find(6) 4 6 5 4 5 6
Still Splaying 6 1 1 2 6 zig-zig 3 3 6 5 4 2 5 4
Almost There, Stay on Target
1 6 1 6 zig 3 3 2 5 2 5 4 4
Splay Again 6 1 6 1 zig-zag 3 4 Find(4) 2 5 3 5 4 2
Example Splayed Out 6 1 4 1 6 zig-zag 3 5 4 2 3 5 2
Locality “Locality” – if an item is accessed, it is likely to be accessed again soon Why? Assume m n access in a tree of size n Total worst case time is O(m log n) O(log n) per access amortized time Suppose only k distinct items are accessed in the m accesses. Time is O(n log n + m log k ) Compare with O( m log n ) for AVL tree those k items are all at the top of the tree getting those k items near root
Splay Operations: Insert
To insert, could do an ordinary BST insert but would not fix up tree A BST insert followed by a find (splay)? Better idea: do the splay before the insert! How? What about insert? Ideas? Can we just do BST insert? NO. Because then we could do an expensive operation without fixing up the tree.
Split Split(T, x) creates two BST’s L and R:
All elements of T are in either L or R All elements in L are x All elements in R are x L and R share no elements Then how do we do the insert? What about insert? Ideas? Can we just do BST insert? NO. Because then we could do an expensive operation without fixing up the tree.
Split Split(T, x) creates two BST’s L and R:
All elements of T are in either L or R All elements in L are x All elements in R are > x L and R share no elements Then how do we do the insert? Insert as root, with children L and R What about insert? Ideas? Can we just do BST insert? NO. Because then we could do an expensive operation without fixing up the tree.
Splitting in Splay Trees
How can we split? We have the splay operation We can find x or the parent of where x would be if we were to insert it as an ordinary BST We can splay x or the parent to the root Then break one of the links from the root to a child How can we implement this? We can splay. We can find x or where x ought to be. We can splay that spot to the root. Now, what do we have? The left subtree is all <= x The right is all >= x
could be x, or what would have been the parent of x
Split could be x, or what would have been the parent of x split(x) splay T L R if root is > x if root is x So, a split just splays x’s spot to the root then hacks off one subtree. This code is _very_ pseudo. You should only use it as a general guideline. OR L R L R x > x < x > x
Back to Insert x split(x) L R L R x > x Insert(x): Split on x
Now, If we can split on x and produce one subtree smaller and one larger than x, insert is easy! Just split on x. Then, hang the left (smaller) subtree on the left of x. Hang the right (larger) subtree on the right of x. Pretty simple, huh? Are we fixing up deep paths? Insert(x): Split on x Join subtrees using x as root
Insert Example Insert(5) 6 4 4 6 1 9 split(5) 1 6 1 9 9 4 7 2 2 7 7 2
Let’s do some examples. 4 6 1 9 2 7
Splay Operations: Delete
x find(x) delete x L R L R < x > x OK, we’ll do something similar for delete. We know x is in the tree. Find it and bring it to the root. Remove it. Now, we have to split subtrees. How do we put them back together? Now what?
Join Join(L, R): given two trees such that L < R, merge them
Splay on the maximum element in L then attach R R L splay L R The join operation puts two subtrees together as long as one has smaller keys to begin with. First, splay the max element of L to the root. Now, that’s gauranteed to have no right child, right? Just snap R onto that NULL right side of the max.
Delete Completed x T find(x) delete x L R L R < x > x Join(L,R)
So, we just join the two subtrees for delete. T - x
Delete Example Delete(4) 6 4 6 1 9 find(4) 1 6 1 9 9 4 7 2 2 7
Find max 7 2 2 2 1 6 1 6 9 9 7 7
Splay Trees, Summary Splay trees are arguably the most practical kind of self-balancing trees If number of finds is much larger than n, then locality is crucial! Example: word-counting Also supports efficient Split and Join operations – useful for other tasks E.g., range queries
Dictionary & Search ADTs
Dictionary ADT (aka map ADT) Stores values associated with user-specified keys keys may be any (homogenous) comparable type values may be any (homogenous) type Search ADT: (aka Set ADT) stores keys only Dictionaries associate some key with a value, just like a real dictionary (where the key is a word and the value is its definition). In this example, I’ve stored user-IDs associated with descriptions of their coolness level. This is probably the most valuable and widely used ADT we’ll hit. I’ll give you an example in a minute that should firmly entrench this concept.
Dictionary & Search ADTs
create : dictionary insert : dictionary key values dictionary find : dictionary key values delete : dictionary key dictionary kim chi spicy cabbage Kreplach tasty stuffed dough Kiwi Australian fruit insert(kohlrabi, upscale tuber) Dictionaries associate some key with a value, just like a real dictionary (where the key is a word and the value is its definition). In this example, I’ve stored user-IDs associated with descriptions of their coolness level. This is probably the most valuable and widely used ADT we’ll hit. find(kreplach) kreplach: tasty stuffed dough
Dictionary Implementations
Arrays: Unsorted Sorted Linked lists BST Random AVL Splay
Dictionary Implementations
Arrays Lists Binary Search Trees unsorted sorted AVL splay insert O(1) O(n) O(log n) amortized find delete find + O(1)
The last dictionary we discuss: B-Trees
Suppose we want to store the data on disk A disk access is a lot more expensive than one CPU operation Example 1,000,000 entries in the dictionary An AVL tree requires log(1,000,000) 20 disk accesses – this is expensive Idea in B Trees: Increase the fan-out, decrease the hight Make 1 node = 1 block
B-Trees Basics All keys are stored at leaves
Nonleaf nodes have guidance keys, to help the search Parameter d = the degree book uses the order M = 2d+1) Rules for Keys: The root is either a leaf, or has between 1 and 2d keys All other nodes (except the root) have between d and 2d keys Rule for number of children: Each node (except leaves) has one more children than keys Balance rule: The tree is perfectly balanced !
B-Trees Basics A non-leaf node: A leaf node: Then called a B+ tree
30 120 240 Keys k < 30 30<=k<120 120<=k<240 Keys 240<=k Then called a B+ tree 40 50 60 Next leaf Record with key 40 Record with key 50 Record with key 60
B+Tree Example d = 2 (M = 5) Find the key 40 80 40 80 20 60 100 120 140 20 < 40 60 10 15 18 20 30 40 50 60 65 80 85 90 30 < 40 40 10 15 18 20 30 40 50 60 65 80 85 90
B+Tree Design How large d ? Example: 2d x 4 + (2d+1) 8 <= 4096
Key size = 4 bytes Pointer size = 8 bytes Block size = 4096 byes 2d x 4 + (2d+1) 8 <= 4096 d = 170
B+ Trees Depth Assume d = 170 How deep is the B-tree ?
Depth = 0 (just the root) at least 170 keys Depth = 1 at least 171 30103 keys Depth = 2 1712 5106 keys Depth = 3 860 106 keys Depth = 4 147 109 keys Nobody has more keys ! With a B tree we can find any data item with at most 5 disk accesses !
Insertion in a B+ Tree Insert (K, P) Find leaf where K belongs, insert
If no overflow (2d keys or less), halt If overflow (2d+1 keys), split node, insert in parent: If leaf, keep K3 too in right node When root splits, new root has 1 key only parent parent K3 K1 K2 K3 K4 K5 P0 P1 P2 P3 P4 p5 K1 K2 P0 P1 P2 K4 K5 P3 P4 p5
Insertion in a B+ Tree Insert K=19 80 20 60 100 120 140 10 15 18 20 30
50 60 65 80 85 90 10 15 18 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree After insertion 80 20 60 100 120 140 10 15 18
19 20 30 40 50 60 65 80 85 90 10 15 18 19 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree Now insert 25 80 20 60 100 120 140 10 15 18 19
30 40 50 60 65 80 85 90 10 15 18 19 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree After insertion 80 20 60 100 120 140 10 15 18
19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree But now have to split ! 80 20 60 100 120 140 10
15 18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree After the split 80 20 30 60 100 120 140 10 15
18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Deletion from a B+ Tree Delete 30 80 20 30 60 100 120 140 10 15 18 19
25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Deletion from a B+ Tree After deleting 30 May change to 40, or not 80
20 30 60 100 120 140 10 15 18 19 20 25 40 50 60 65 80 85 90 10 15 18 19 20 25 40 50 60 65 80 85 90
Deletion from a B+ Tree Now delete 25 80 20 30 60 100 120 140 10 15 18
19 20 25 40 50 60 65 80 85 90 10 15 18 19 20 25 40 50 60 65 80 85 90
Deletion from a B+ Tree After deleting 25 Need to rebalance Rotate 80
20 30 60 100 120 140 10 15 18 19 20 40 50 60 65 80 85 90 10 15 18 19 20 40 50 60 65 80 85 90
Deletion from a B+ Tree Now delete 40 80 19 30 60 100 120 140 10 15 18
50 60 65 80 85 90 10 15 18 19 20 40 50 60 65 80 85 90
Deletion from a B+ Tree After deleting 40 Rotation not possible
Need to merge nodes 80 19 30 60 100 120 140 10 15 18 19 20 50 60 65 80 85 90 10 15 18 19 20 50 60 65 80 85 90
Deletion from a B+ Tree Final tree 80 19 60 100 120 140 10 15 18 19 20
50 60 65 80 85 90 10 15 18 19 20 50 60 65 80 85 90
Similar presentations
© 2025 Inc.
All rights reserved.