Union-Find with Constant Time Deletions Stephen Alstrup Inge Li Gørtz Theis Rauhe Mikkel Thorup Uri Zwick
Union-Find Make(x): Create a set containing x Union(A,B): Unite the sets A and B Find(x): Return the set containing x Delete(x): Delete x from its set
Applications of Union-Find Maintaining an equivalence relation Computing minimum spanning trees Computing dominators in graphs Many other algorithms
Applications of Union-Find with deletions Implementation of meldable priority queues Meldable priority queues used to compute minimum directed spanning trees
Union-Find without deletions Worst case make unite find O(1) O(log n) O(k) O(log k n) Smid ’90 Amortized make unite find O(1) O(α(m+n,n)) Tarjan ’75
Amortized analysis [Tarjan ’75] [van Leeuven, Tarjan ’84] The cost of any intermixed sequence containing n make operations and m find operations is O( n + m α(m+n,n) ) . [Kaplan, Shafrir, Tarjan ’02] The amortized cost of each find operation is only α(m+n,n,l) , where l is the number of operations in the set found.
α(m,n) = min{ k : Ak(m/n) ≥ n } Ackermann’s function A0(j) = j+1 Ai(j) = Ai-1(j+1)(j) Grows extremely FAST α(n) = min{ k : Ak(1) ≥ n } α(m,n) = min{ k : Ak(m/n) ≥ n } Grows extremely slow
Union-Find with deletions [Kaplan, Shafrir, Tarjan ’02] Delete operations are not more expensive than find operations. They can thus be implemented in O(log l) worst case time and O(α(m+n,n,l)) amortized time. [Here] Delete operations can be implemented in O(1) worst case and amortized time.
Union Find Represent each set as a rooted tree Union by rank Path compression x The parent of a vertex x is denoted by p[x] Find(x) traces the path from x to the root
Union by rank r+1 r2 r r r1 r1< r2 r1 r1< r2 Union by rank on its own gives O(log l) find time A tree of rank k contains at least 2k elements If x is not a root, then rank(x)<rank(p[x])
Path Compression
Handling deletions Simplest thing to do: Ignore them! Space O(N) instead of O(n), find time O(log N) instead of O(log l), where N is the # of elements ever created. Next thing to do: Global Rebuilding Keep track of the number of elements that were deleted. If at least, say, half of the elements are deleted, rebuild all trees. Easily works in the amortized setting. Can be done in the “background”. Find time is O(log n) or O(α(m+n,n)).
Handling deletions [Kaplan, Shafrir, Tarjan ’02] Local rebuilding For each set keep an old tree and a new tree. When an element is deleted from the old tree, move four elements to the new tree. At least ¼ of the elements of the old tree are not deleted and at least ½ of the elements of the new tree are not deleted. When the old tree is empty, the new tree takes its place and a new new tree is constructed. For each delete we need to do a find to know from which tree the element is deleted
Deletions in constant time Keep trees tidy. Following each delete, and in some other cases, perform a constant number of short-cut operations. Works in both the worst case and amortized settings
Tidy and untidy trees Nodes are either occupied or vacant A tree is tidy if: Every leaf is occupied and has rank 0. Every vacant non-root node has at least two children. At least ½ of the nodes of a tidy tree are occupied.
A reduced tree is a tidy tree of height 1 whose root is of rank 1. Reduced trees A reduced tree is a tidy tree of height 1 whose root is of rank 1.
Deleting an element from a tidy tree Remove the element from its node. If a leaf is now vacant, remove it from the tree. If a new leaf is created, reduce its rank to 0. If a vacant non-root element with only one child is created, short-cut it. y x z y z y y z x z
Keeping tidy trees shallow After each deletion, perform seven short-cutting steps: Short-cut(v): “Take a grandchild of v and hang it on v” v Each short-cutting step is slightly more complicated but is still quite simple and takes only constant time.
Case 1: v has an occupied child which has a child short-cut(v) Case 1: v has an occupied child which has a child v v
Case 2: v has a vacant child with at least three children short-cut(v) Case 2: v has a vacant child with at least three children v v
Case 3: v has a vacant child with two occupied children short-cut(v) Case 3: v has a vacant child with two occupied children v v
Case 4: v has a vacant grandchild with at least three children short-cut(v) Case 4: v has a vacant grandchild with at least three children v v
Case 5: v has a vacant grandchild with only two children short-cut(v) Case 5: v has a vacant grandchild with only two children v v
Case 6: If v does not have grandchildren, let v←p[p[v]] and try again short-cut(v) Case 6: If v does not have grandchildren, let v←p[p[v]] and try again Case 7: If v does not have grandchildren and is a root, change the rank of v to 1. The tree is now reduced.
Find takes O(log l) worst-case time The trees are shallow Theorem: |A| ≥ (2/3)(6/5)rank(A) Corollary: rank(A) ≤ log6/5(3|A|/2) = O(log|A|+1) Find takes O(log l) worst-case time
Values Value of a node x : (5/3)rank(p[x]) if x occupied val(x) = (1/2)(5/3)rank(p[x])) if x vacant Value of a set A : VAL(A) = ∑xA val(x) Theorem: VAL(A) ≥ 2 rank(A)
It is easy to show that make and union maintain this property VAL(A) ≥ 2 rank(A) Show that a delete operation, followed by four short-cutting steps, either does not decrease the value, or generates a reduced tree. It is easy to show that make and union maintain this property Later we will show that this property is also maintained by path compression, with an appropriate collection of short-cuts
(3/2) |A| (5/3)rank(A) ≥ VAL(A) ≥ 2rank(A) The tree representing A contains exactly |A| occupied nodes and at most |A| vacant nodes (3/2) |A| (5/3)rank(A) ≥ VAL(A) ≥ 2rank(A) |A| ≥ (2/3)(6/5)rank(A)
How much value is lost by delete? k=rank[v] v v y x z z Value lost: –(5/3)k-1 – (1/2)(5/3)k + (1/2)( (5/3)k–(5/3)k-1 ) = (9/10)(5/3)k
short-cut(v) v v Case 1: v has an occupied child which has a child Gain: (1/2)((5/3)k-(5/3)k-1) = (1/5)(5/3)k
short-cut(v) Case 2: v has a vacant child with at least three children v v Gain: (1/2)((5/3)k-(5/3)k-1) = (1/5)(5/3)k
short-cut(v) Case 3: v has a vacant child with two occupied children v v Gain: 2((5/3)k-(5/3)k-1) – (1/2)(5/3)k = (3/10)(5/3)k
short-cut(v) Case 4: v has a vacant grandchild with at least three children v v Gain: (1/2)((5/3)k-(5/3)k-2) = (8/25)(5/3)k
Case 5: v has a vacant grandchild with only two children short-cut(v) Case 5: v has a vacant grandchild with only two children v Gain: (1/2)((5/3)k–(5/3)k-2) + (1/2) ((5/3)k-1–(5/3)k-2) – (1/2)(5/3)k-1 = (7/50)(5/3)k
Amortized bounds Path compression and shortcutting combine nicely together. After each path compression we need to do some tidying up and some short-cuts to maintain the value Give new potential-based analysis for local amortized bounds. x
Melding Priority Queues Ran Mendelson Robert E. Tarjan Mikkel Thorup Uri Zwick
Non-meldable Priority Queue Meldable Priority Queue Improved analysis of transformation Non-meldable Priority Queue Meldable Priority Queue pq(n)+α(n) time per operation pq(n) time per operation or pq(n)α(n,n/pq(n)) time per operation
Second transformation Meldable Priority Queue pq(n) time per operation pq(N) time per operation n – number of elements in priority queue Keys are is {1,2,…,N}
Meldable Priority Queues Insert Delete Find-Min O(1) O(log n) Dec-Key 10 25 4 7 13 2 17 1 Dec-Key O(1) 5 38 Meld Amortized [Fredman-Tarjan ’87] Worst case [Brodal ’96] Best possible comparison based results
using our transformation RAM Priority Queues Keys are integers that fit into a single machine word. Standard arithmetical and logical operations take constant time Insert Delete Find-Min O(1) O(log log n) 010010 001001 011010 Dec-Key using our transformation Meld O(1) NO [Thorup ’03]
At most O(log2n) elements! Atomic heaps Insert Delete Find-Min O(1) 011010 000010 010011 At most O(log2n) elements! Meld NO [Fredman-Willard ’94]
Non-meldable priority queue + Union Find with deletions
Use the union-find data stricture to maintain the sets Place a non-meldable priority queue at each node of a union-find tree holding the minimal element in each one of its subtrees 9 1 5 1 2 4 5 3 19 2 7 4 8 6 19 2 4 8 6
Handling deletions using path compression The amortized delete cost is O(pq(n)α(n)) [MTZ’04] [van Emde Boaz, Kaas, Zijlstra ’77 ]
Flavor of improved analysis rank ≥ k At most n/2k nodes size ≥ 2k rank < k size < 2k Choose k=2loglog n. If f>n/log n, we are done.
More flavor of improved analysis rank ≥ k size ≥ 2k rank < k size ≥ 2k rank < k size < 2k
Worst-case non-meldable priority queues Sorting Worst-case non-meldable priority queues Amortized meldable priority queues