Union-Find.

Union-Find

Example: Computer Networks
How do you: Discover if a node can reach another node? Determine how to route messages to nodes not directly connected? Control access to a common resource? ... Data Structures Algorithms

The Connectivity Problem (in the context of self-configurable wireless networks)
Who’s out there? A node i can only send messages to every other node j if the network is connected.

Three Mathematical Constructs
4 Graph: 9 3 5 2 Set of vertices 7 8 6 Set of edges Tree: 1 A tree (an acyclic graph) for which V’=V and is called a spanning tree on G. Note that a spanning tree on G has

Constructing a Spanning Tree
Generic-ST(G) 1 2 while A is not a ST 3 do find an edge (u,v) that is safe for A 4 5 return A 4 9 3 5 2 7 8 6 Invariant: Prior to each iteration, A is a subset of some spanning tree. This is an ALGORITHM, a systematic description of the steps toward the solution. Because it’s generic, we haven’t specified any data-structure to work with; one can assume that anything that works is good enough. Will that be true? What about efficiency? Some data-structures will be better suited to help with this task than others. Tthis is a GREEDY algorithm: - the strategy is to pick a choice that is BEST at the moment. - the drawback is that you’re not guaranteed to find optimal solutions. 1 Note how the algorithm depends on two operations: find and union.

Union-Find Algorithm 1 2 3 4 5 6 7 8 9 Input Output
2-9 5-6 0-2 1 2 3 4 5 6 7 8 9 Pre-condition: each node is by itself in a set of size 1. Invariant: all nodes in a set are connected. First step: find the set A that contains p, find the set B that contains q. Second step: if there exists an edge (p,q) then by transitivity all nodes in A and B are connected. Substitute A and B by a new set computed as (A U B). LOOP

Union-Find Algorithm 1 2 5 6 7 8 9 4 3 Input Output 3-4 3-4 4-9 4-9
2-9 5-6 0-2 1 2 5 6 7 8 9 4 3

2-9 5-6 0-2 1 2 5 6 7 8 4 9 3

2-9 5-6 0-2 1 2 5 6 7 4 9 8 3

2-9 5-6 0-2 1 5 6 7 4 9 8 3 2

2-9 5-6 0-2 1 7 4 9 8 3 2 6 5

2-9 5-6 0-2 1 4 9 8 3 7 2 6 5

2-9 5-6 0-2 1 4 9 3 7 8 2 6 5

Union-Find Algorithm Input Output 2-9 5-6 0-2 1 4 9 3 7 8 2 6 5 Great, but how do we represent sets?

Data-Structure for Union-Find
1 2 3 4 5 6 7 8 9 id This is “just” an array of integer values. Each entry i corresponds to a node in the graph. The value in each entry, id[i], points to another node in the graph. With this simple arrangement, we can implement the concept of sets. Remember: We only want to memory what nodes are connected. If we learn that i is connected to j, we put them in the same set.

The quick-find solution to connectivity problem
Input Output 2-9 5-6 0-2 1 2 3 4 5 6 7 8 9 id 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

Input Output 2-9 5-6 0-2 1 2 3 4 5 6 7 8 9 id 1 2 4 5 6 7 8 9 1 2 4 5 6 7 8 9 3

Input Output 2-9 5-6 0-2 1 2 3 4 5 6 7 8 9 id 1 2 9 5 6 7 8 1 2 5 6 7 8 9 3 4

Input Output 2-9 5-6 0-2 1 2 3 4 5 6 7 8 9 id 1 2 9 5 6 7 1 2 5 6 7 9 8 3 4

Input Output 2-9 5-6 0-2 1 2 3 4 5 6 7 8 9 id 1 9 5 6 7 1 5 6 7 9 8 2 3 4

Input Output 2-9 5-6 0-2 1 2 3 4 5 6 7 8 9 id 1 9 6 7 1 6 7 9 8 5 2 3 4

Input Output 2-9 5-6 0-2 1 2 3 4 5 6 7 8 9 id 1 9 7 1 7 9 8 5 6 2 3 4

Input Output 2-9 5-6 0-2 1 2 3 4 5 6 7 8 9 id 1 9 7 1 9 8 7 5 6 2 3 4

Input Output 2-9 5-6 0-2 1 2 3 4 5 6 7 8 9 id 1 1 8 7 5 6 2 3 4 9

Start with the array such that id[i]=i, for all i. repeat Read one edge, values p and q. Find: To indicate that p and q are connected, make id[p]=q (remember that id[q]=q as a pre-condition). Union: change id[i]=p to id[i]=q, for all i. until (there are not more edges to read in) What is the cost of each of these steps? WRONG

Analysis of the quick-find solution
Find operation: trivial; accesses a known position in the array. Union operation: requires a pass over the entire array. If N is the number of vertices and M is the number of union operations performed: for each union, we iterate the loop N times, we perform a total of MN loops, M is at most as large as the number of edges in the input.

The Quick-union solution to the connectivity problem
Input Output 2-9 5-6 0-2 1 2 3 4 5 6 7 8 9

Input Output 2-9 5-6 0-2 1 2 4 5 6 7 8 9 3

Input Output 2-9 5-6 0-2 1 2 5 6 7 8 9 4 3

Input Output 2-9 5-6 0-2 1 2 5 6 7 9 8 4 3

Input Output 2-9 5-6 0-2 1 5 6 7 9 2 4 8 3

Input Output 2-9 5-6 0-2 1 6 7 9 2 4 8 5 3

Input Output 2-9 5-6 0-2 1 7 9 6 2 4 8 5 3

Input Output 2-9 5-6 0-2 1 9 7 6 2 4 8 5 3

Input Output 2-9 5-6 0-2 1 8 9 7 6 2 4 5 3

Analysis of the Quick-union solution
Find operation: non-trivial. Starting from some node in the tree, we traverse the path upward until we find its root. This is done twice for an edge (p,q): once for p and once for q. Union operation: Trivial. To compute the union of two sets, all we have to do is change one pointer. Questions: How does it compare to Quick-find in terms of memory utilization? And in terms of execution time?

Analysis of the Quick-union solution
Claim: For a number of edges M>N in the input, Quick-union may traverse a number of pointers that is quadratic with N, the number of nodes. Proof: Assume that the first N-1 pairs in the input are (0-1),(0-2),(0-3), (0-4), ..., (0-(N-1)). (don’t count self-references) Find for (0-1) traverses 0 pointers. Find for (0-2) traverses 1 pointer. Find for (0-3) traverses 2 pointers. ... Find for (0-(N-1)) traverses N-1 pointers. Number of pointers traversed for N-1 pairs: ( (N-1)) = N(N-1) / 2 N-1 N-2 ... 3 2 Regardless of what the remaining pairs are, the number of pointers traversed will be “bounded from below” by N(N-1)/2. 1

An Improvement to Quick-union
The goal is to make Find faster by keeping paths shorter. Say we have two trees with depths n and m, such that n < m. How can we best combine the two trees (union) so that the depth of the new tree is minimized? m n m n

An Improvement to Quick-union
What if the information we have is the number of nodes in each tree rather than their depths? Claim: any tree with k nodes (size=k) constructed by our algorithm will have Proof (by induction): Base case: at the start, we have sets of size 1 (depth = 0 = lg 1). Induction hypothesis: Induction step: the union of two trees with n and m nodes, such that has depth:

Weighted Quick-union 1 2 3 4 5 6 7 8 9 id Data-structures size
1 2 3 4 5 6 7 8 9 id Data-structures size Each entry i corresponds to a node in the graph. The value in each entry, id[i], points to another node in the graph; size[i] tells us how many nodes that tree contains. Algorithm: same as for Quick-union, except that we always connect the smaller tree to the larger tree.

Analysis of Weighted Quick-union
What is the cost of Find for each edge in the input? How has it changed from Quick-union? If the graph has N nodes and the input has M edges, what is the worst-case cost of this algorithm?

Worst-case for Weighted Quick-union
Think of an input pattern that causes every union to link together trees of equal size (which is a power of 2). Every union operation causes the depth of the tree to increase by one.

Further Improvements? Ideal: If every node were made to point directly to the root, unions would be trivial. Does this sound familiar? What cost did we pay when we attempted this? 9 (4-8) 8 7 5 6 2 3 4 8 7 5 6 2 3 4 9

Two Heuristics 1) Union by Rank Store rank of tree in rep.
Rank  tree size. Make root with smaller rank point to root with larger rank. 2) Path Compression During Find-Set, “flatten” tree. d d c F-S(a) b a a b c

Operations Make-Set(x) p[x] := x; Link(x, y) rank[x] := 0
if rank[x] > rank[y] then p[y] := x else p[x] := y; if rank[x] = rank[y] then rank[y] := rank[y] + 1 fi Find-Set(x) if x  p[x] then p[x] := Find-Set(p[x]) fi; return p[x] Union(x, y) Link(Find-Set(x), Find-Set(y)) rank = u.b. on height

Find-Set c a b c a b F-S(a) p[a] := F-S(b) p[b] := F-S(c) { return c

Time Complexity Tight upper bound on time complexity: O(m (m,n)).
(m,n) = inverse of Ackermann’s function (almost a constant). A slightly easier bound of O(m lg*n) is established in CLR.

Ackermann’s Function A(1, j) = 2j j 1 A(i,1) = A(i–1, 2) i 2
A(i, j) = A(i–1, A(i, j–1)) i, j  2 Grows very fast (inverse grows very slow). A(3, 4) = Notation: Note: This is one of several in-equivalent but similar definitions of Ackermann’s function found in the literature. CLRS gives a different definition. Please see the CLR handout. Powerpoint doesn’t do a great job with this notation.

Inverse of Ackermann’s Function
(m,n) = min{i1 : A(i, m/n) > lg n} Note: Not a “true” mathematical inverse. Intuition: Grows about as slowly as Ackermann’s function does fast. How slowly? Let m/n = k. m  n  k  1. We can show that A(i, k)  A(i, 1) for all i  1. Consider i = 4: A(i, k)  A(4, 1) =  1080 So, (m,n)  4 if lg n < 1080, i.e., if n <

Bound We Establish We establish O(m lg*n) as an upper bound.
Recall lg*n = min{i  0: lg(i) n  1}. In particular: And hence: lg* = 5. Thus, lg*n  5 for all practical purposes.

One-pass path compression
Hunch: We can modify the structure of the tree as each edge is processed to attempt to save on the overall run-time of the algorithm. When we do a Find, we can do a bit of “maintenance” work along the path traversed. What is happening to the depth of the tree? What impact does this work have on future Find operations?

Union-Find.

Similar presentations

Presentation on theme: "Union-Find."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Union-Find.

Similar presentations

Presentation on theme: "Union-Find."— Presentation transcript:

Similar presentations

About project

Feedback