Union-Find.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Union-Find: A Data Structure for Disjoint Set Operations
Discussion #36 Spanning Trees
CSE 326: Data Structures Disjoint Union/Find. Equivalence Relations Relation R : For every pair of elements (a, b) in a set S, a R b is either true or.
Minimum Spanning Trees (MST)
Lecture 16: Union and Find for Disjoint Data Sets Shang-Hua Teng.
CSE 373, Copyright S. Tanimoto, 2002 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
MA/CSSE 473 Day 36 Kruskal proof recap Prim Data Structures and detailed algorithm.
Spring 2015 Lecture 11: Minimum Spanning Trees
Minimum Spanning Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.
CS 473Lecture X1 CS473-Algorithms I Lecture X1 Properties of Ranks.
CMSC 341 Disjoint Sets Textbook Chapter 8. Equivalence Relations A relation R is defined on a set S if for every pair of elements (a, b) with a,b  S,
Disjoint Sets Data Structure (Chap. 21) A disjoint-set is a collection  ={S 1, S 2,…, S k } of distinct dynamic sets. Each set is identified by a member.
Lecture X Disjoint Set Operations
CSE373: Data Structures & Algorithms Lecture 10: Implementing Union-Find Dan Grossman Fall 2013.
MA/CSSE 473 Days Answers to student questions Prim's Algorithm details and data structures Kruskal details.
0 Union-Find data structure. 1 Disjoint set ADT (also Dynamic Equivalence) The universe consists of n elements, named 1, 2, …, n n The ADT is a collection.
Union By Rank Ackermann’s Function Graph Algorithms Rajee S Ramanikanthan Kavya Reddy Musani.
CSE 373, Copyright S. Tanimoto, 2001 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
Data Structures for Disjoint Sets Manolis Koubarakis Data Structures and Programming Techniques 1.
Tirgul 12 Solving T4 Q. 3,4 Rehearsal about MST and Union-Find
Lecture 3: Uninformed Search
Lower Bounds & Sorting in Linear Time
CSE 373: Data Structures and Algorithms
All-pairs Shortest paths Transitive Closure
Data Structures for Disjoint Sets
Top 50 Data Structures Interview Questions
CSE 373, Copyright S. Tanimoto, 2001 Up-trees -
Greedy method Idea: sequential choices that are locally optimum combine to form a globally optimum solution. The choices should be both feasible and irrevocable.
MA/CSSE 473 Days Answers to student questions
Disjoint Sets Chapter 8.
Lectures on Network Flows
CS 3343: Analysis of Algorithms
I206: Lecture 15: Graphs Marti Hearst Spring 2012.
An application of trees: Union-find problem
Course Outline Introduction and Algorithm Analysis (Ch. 2)
CMSC 341 Disjoint Sets Based on slides from previous iterations of this course.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Linda Shapiro Spring 2016.
Enumerating Distances Using Spanners of Bounded Degree
CS200: Algorithm Analysis
Data Structures & Algorithms Union-Find Example
CSE 373 Data Structures and Algorithms
CS 583 Analysis of Algorithms
Data Structures & Algorithms Union-Find Example
CSCE 411 Design and Analysis of Algorithms
CSE 332: Data Abstractions Union/Find II
Lower Bounds & Sorting in Linear Time
3. Brute Force Selection sort Brute-Force string matching
Union-find algorithms
CSE373: Data Structures & Algorithms Implementing Union-Find
Union-Find.
3. Brute Force Selection sort Brute-Force string matching
CMSC 341 Disjoint Sets.
Disjoint Sets Given a set {1, 2, …, n} of n elements.
Disjoint Sets DS.S.1 Chapter 8 Overview Dynamic Equivalence Classes
CMSC 341 Disjoint Sets.
Running Time Analysis Union is clearly a constant time operation.
Kruskal’s algorithm for MST and Special Data Structures: Disjoint Sets
Disjoint Sets Data Structure (Chap. 21)
An application of trees: Union-find problem
Quicksort Quick sort Correctness of partition - loop invariant
Disjoint Sets Textbook Chapter 8
CSE 373: Data Structures and Algorithms
Minimum Spanning Trees
3. Brute Force Selection sort Brute-Force string matching
Disjoint Set Operations: “UNION-FIND” Method
Presentation transcript:

Union-Find

Example: Computer Networks How do you: Discover if a node can reach another node? Determine how to route messages to nodes not directly connected? Control access to a common resource? ... Data Structures Algorithms

The Connectivity Problem (in the context of self-configurable wireless networks) Who’s out there? A node i can only send messages to every other node j if the network is connected.

Three Mathematical Constructs 4 Graph: 9 3 5 2 Set of vertices 7 8 6 Set of edges Tree: 1 A tree (an acyclic graph) for which V’=V and is called a spanning tree on G. Note that a spanning tree on G has .

Constructing a Spanning Tree Generic-ST(G) 1 2 while A is not a ST 3 do find an edge (u,v) that is safe for A 4 5 return A 4 9 3 5 2 7 8 6 Invariant: Prior to each iteration, A is a subset of some spanning tree. This is an ALGORITHM, a systematic description of the steps toward the solution. Because it’s generic, we haven’t specified any data-structure to work with; one can assume that anything that works is good enough. Will that be true? What about efficiency? Some data-structures will be better suited to help with this task than others. Tthis is a GREEDY algorithm: - the strategy is to pick a choice that is BEST at the moment. - the drawback is that you’re not guaranteed to find optimal solutions. 1 Note how the algorithm depends on two operations: find and union.

Union-Find Algorithm 1 2 3 4 5 6 7 8 9 Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 3 4 5 6 7 8 9 Pre-condition: each node is by itself in a set of size 1. Invariant: all nodes in a set are connected. First step: find the set A that contains p, find the set B that contains q. Second step: if there exists an edge (p,q) then by transitivity all nodes in A and B are connected. Substitute A and B by a new set computed as (A U B). LOOP

Union-Find Algorithm 1 2 5 6 7 8 9 4 3 Input Output 3-4 3-4 4-9 4-9 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 5 6 7 8 9 4 3

Union-Find Algorithm 1 2 5 6 7 8 4 9 3 Input Output 3-4 3-4 4-9 4-9 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 5 6 7 8 4 9 3

Union-Find Algorithm 1 2 5 6 7 4 9 8 3 Input Output 3-4 3-4 4-9 4-9 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 5 6 7 4 9 8 3

Union-Find Algorithm 1 5 6 7 4 9 8 3 2 Input Output 3-4 3-4 4-9 4-9 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 5 6 7 4 9 8 3 2

Union-Find Algorithm 1 7 4 9 8 3 2 6 5 Input Output 3-4 3-4 4-9 4-9 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 7 4 9 8 3 2 6 5

Union-Find Algorithm 1 7 4 9 8 3 2 6 5 Input Output 3-4 3-4 4-9 4-9 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 7 4 9 8 3 2 6 5

Union-Find Algorithm 1 4 9 8 3 7 2 6 5 Input Output 3-4 3-4 4-9 4-9 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 4 9 8 3 7 2 6 5

Union-Find Algorithm 1 4 9 3 7 8 2 6 5 Input Output 3-4 3-4 4-9 4-9 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 4 9 3 7 8 2 6 5

Union-Find Algorithm Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 4 9 3 7 8 2 6 5 Great, but how do we represent sets?

Data-Structure for Union-Find 1 2 3 4 5 6 7 8 9 id This is “just” an array of integer values. Each entry i corresponds to a node in the graph. The value in each entry, id[i], points to another node in the graph. With this simple arrangement, we can implement the concept of sets. Remember: We only want to memory what nodes are connected. If we learn that i is connected to j, we put them in the same set.

The quick-find solution to connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 3 4 5 6 7 8 9 id 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

The quick-find solution to connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 3 4 5 6 7 8 9 id 1 2 4 5 6 7 8 9 1 2 4 5 6 7 8 9 3

The quick-find solution to connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 3 4 5 6 7 8 9 id 1 2 9 5 6 7 8 1 2 5 6 7 8 9 3 4

The quick-find solution to connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 3 4 5 6 7 8 9 id 1 2 9 5 6 7 1 2 5 6 7 9 8 3 4

The quick-find solution to connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 3 4 5 6 7 8 9 id 1 9 5 6 7 1 5 6 7 9 8 2 3 4

The quick-find solution to connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 3 4 5 6 7 8 9 id 1 9 6 7 1 6 7 9 8 5 2 3 4

The quick-find solution to connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 3 4 5 6 7 8 9 id 1 9 7 1 7 9 8 5 6 2 3 4

The quick-find solution to connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 3 4 5 6 7 8 9 id 1 9 7 1 9 8 7 5 6 2 3 4

The quick-find solution to connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 3 4 5 6 7 8 9 id 1 1 8 7 5 6 2 3 4 9

The quick-find solution to connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 3 4 5 6 7 8 9 id 1 1 8 7 5 6 2 3 4 9

The quick-find solution to connectivity problem Start with the array such that id[i]=i, for all i. repeat Read one edge, values p and q. Find: To indicate that p and q are connected, make id[p]=q (remember that id[q]=q as a pre-condition). Union: change id[i]=p to id[i]=q, for all i. until (there are not more edges to read in) What is the cost of each of these steps? WRONG

Analysis of the quick-find solution Find operation: trivial; accesses a known position in the array. Union operation: requires a pass over the entire array. If N is the number of vertices and M is the number of union operations performed: for each union, we iterate the loop N times, we perform a total of MN loops, M is at most as large as the number of edges in the input.

The Quick-union solution to the connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 3 4 5 6 7 8 9

The Quick-union solution to the connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 4 5 6 7 8 9 3

The Quick-union solution to the connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 5 6 7 8 9 4 3

The Quick-union solution to the connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 2 5 6 7 9 8 4 3

The Quick-union solution to the connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 5 6 7 9 2 4 8 3

The Quick-union solution to the connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 6 7 9 2 4 8 5 3

The Quick-union solution to the connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 7 9 6 2 4 8 5 3

The Quick-union solution to the connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 9 7 6 2 4 8 5 3

The Quick-union solution to the connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 8 9 7 6 2 4 5 3

The Quick-union solution to the connectivity problem Input Output 3-4 3-4 4-9 4-9 8-0 8-0 2-3 2-3 5-6 5-6 2-9 5-9 5-9 7-3 7-3 4-8 4-8 5-6 0-2 6-1 6-1 1 8 9 7 6 2 4 5 3

Analysis of the Quick-union solution Find operation: non-trivial. Starting from some node in the tree, we traverse the path upward until we find its root. This is done twice for an edge (p,q): once for p and once for q. Union operation: Trivial. To compute the union of two sets, all we have to do is change one pointer. Questions: How does it compare to Quick-find in terms of memory utilization? And in terms of execution time?

Analysis of the Quick-union solution Claim: For a number of edges M>N in the input, Quick-union may traverse a number of pointers that is quadratic with N, the number of nodes. Proof: Assume that the first N-1 pairs in the input are (0-1),(0-2),(0-3), (0-4), ..., (0-(N-1)). (don’t count self-references) Find for (0-1) traverses 0 pointers. Find for (0-2) traverses 1 pointer. Find for (0-3) traverses 2 pointers. ... Find for (0-(N-1)) traverses N-1 pointers. Number of pointers traversed for N-1 pairs: (0+1+2+...+(N-1)) = N(N-1) / 2 N-1 N-2 ... 3 2 Regardless of what the remaining pairs are, the number of pointers traversed will be “bounded from below” by N(N-1)/2. 1

An Improvement to Quick-union The goal is to make Find faster by keeping paths shorter. Say we have two trees with depths n and m, such that n < m. How can we best combine the two trees (union) so that the depth of the new tree is minimized? m n m n

An Improvement to Quick-union What if the information we have is the number of nodes in each tree rather than their depths? Claim: any tree with k nodes (size=k) constructed by our algorithm will have Proof (by induction): Base case: at the start, we have sets of size 1 (depth = 0 = lg 1). Induction hypothesis: Induction step: the union of two trees with n and m nodes, such that has depth:

Weighted Quick-union 1 2 3 4 5 6 7 8 9 id Data-structures size 1 2 3 4 5 6 7 8 9 id Data-structures size Each entry i corresponds to a node in the graph. The value in each entry, id[i], points to another node in the graph; size[i] tells us how many nodes that tree contains. Algorithm: same as for Quick-union, except that we always connect the smaller tree to the larger tree.

Analysis of Weighted Quick-union What is the cost of Find for each edge in the input? How has it changed from Quick-union? If the graph has N nodes and the input has M edges, what is the worst-case cost of this algorithm?

Worst-case for Weighted Quick-union Think of an input pattern that causes every union to link together trees of equal size (which is a power of 2). Every union operation causes the depth of the tree to increase by one.

Further Improvements? Ideal: If every node were made to point directly to the root, unions would be trivial. Does this sound familiar? What cost did we pay when we attempted this? 9 (4-8) 8 7 5 6 2 3 4 8 7 5 6 2 3 4 9

Two Heuristics 1) Union by Rank Store rank of tree in rep. Rank  tree size. Make root with smaller rank point to root with larger rank. 2) Path Compression During Find-Set, “flatten” tree. d d c F-S(a) b a a b c

Operations Make-Set(x) p[x] := x; Link(x, y) rank[x] := 0 if rank[x] > rank[y] then p[y] := x else p[x] := y; if rank[x] = rank[y] then rank[y] := rank[y] + 1 fi Find-Set(x) if x  p[x] then p[x] := Find-Set(p[x]) fi; return p[x] Union(x, y) Link(Find-Set(x), Find-Set(y)) rank = u.b. on height

Find-Set c a b c a b F-S(a) p[a] := F-S(b) p[b] := F-S(c) { return c

Time Complexity Tight upper bound on time complexity: O(m (m,n)). (m,n) = inverse of Ackermann’s function (almost a constant). A slightly easier bound of O(m lg*n) is established in CLR.

Ackermann’s Function A(1, j) = 2j j 1 A(i,1) = A(i–1, 2) i 2 A(i, j) = A(i–1, A(i, j–1)) i, j  2 Grows very fast (inverse grows very slow). A(3, 4) = Notation: Note: This is one of several in-equivalent but similar definitions of Ackermann’s function found in the literature. CLRS gives a different definition. Please see the CLR handout. Powerpoint doesn’t do a great job with this notation.

Inverse of Ackermann’s Function (m,n) = min{i1 : A(i, m/n) > lg n} Note: Not a “true” mathematical inverse. Intuition: Grows about as slowly as Ackermann’s function does fast. How slowly? Let m/n = k. m  n  k  1. We can show that A(i, k)  A(i, 1) for all i  1. Consider i = 4: A(i, k)  A(4, 1) =  1080 So, (m,n)  4 if lg n < 1080, i.e., if n < 21080.

Bound We Establish We establish O(m lg*n) as an upper bound. Recall lg*n = min{i  0: lg(i) n  1}. In particular: And hence: lg*265536 = 5. Thus, lg*n  5 for all practical purposes.

One-pass path compression Hunch: We can modify the structure of the tree as each edge is processed to attempt to save on the overall run-time of the algorithm. When we do a Find, we can do a bit of “maintenance” work along the path traversed. What is happening to the depth of the tree? What impact does this work have on future Find operations?