Fundamental Data Structures and Algorithms Peter Lee April 24, 2003 Union-Find
Announcements Quiz #4 is available until midnight tonight! HW6 is due next week! tournament on May 7 Final Exam on May 8, 8:30am! review session TBA
Building Mazes
Thinking about the problem Think about a grid of rooms separated by walls. Each room can be given a name. abcd hgfe ijkl ponm Randomly knock out walls until we get a good maze.
Mathematical formulation A set of rooms: {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p} Pairs of adjacent rooms that have an open wall between them. For example, (a,b) and (g,k) are pairs. abcd hgfe ijkl ponm
Mazes as graphs abcd hgfe ijkl ponm {(a,b), (b,c), (a,e), (e,i), (i,j), (f,j), (f,g), (g,h), (d,h), (g,k), (m,n), (n,o), (k,o), (o,p), (l,p)}
Mazes as graphs {(a,b), (b,c), (a,e), (e,i), (i,j), (f,j), (f,g), (g,h), (d,h), (g,k), (m,n), (n,o), (k,o), (o,p), (l,p)} abcd efgh ijkl mnop For a maze to have a unique solution, its graph must be a tree.
Mazes as trees A spanning tree is a tree that includes all of the nodes. Why is it good to have a spanning tree? a b c d e f g h i j k lm n o p
Algorithm Essentially: Randomly pick a wall and delete it (add it to the tree) if it won’t create a cycle. Stop when a spanning tree has been created. This is Kruskal’s Algorithm.
Creating a spanning tree When adding a wall to the tree, how do we detect that it won’t create a cycle? When adding wall (x,y), we want to know if there is already a path from x to y in the tree.
Using the union-find algorithm We put rooms into an equivalence class if there is a path connecting them. Before adding an edge (x,y) to the tree, make sure that x and y are not in the same equivalence class. abcd efgh ijkl mnop Partially- constructed maze
Dynamic Equivalence Relations
Equivalence relations The two-place relation “~” is an equivalence relation if (for all a, b, and c): a ~ a reflexive a ~ b iff b ~ a symmetric a ~ b & b~ c a ~ c transitive
Equivalence relations? << transitive, not reflexive, not symmetric <= transitive, reflexive, not symmetric e 1 = O(e 2 ) transitive, not reflexive, not symmetric == transitive, reflexive, symmetric connected transitive, reflexive, symmetric
Equivalence classes For any given element x 2 S and two-place equivalence relation ~, the equivalence class of x is { y | y 2 S Æ x~y}
Making equivalence dynamic Dynamic operations on an equivalence relation. For example, when removing walls in a maze Operations: find(i): returns the equivalence class of i. union(i,j): joins the classes of i and j.
{1} {2} {3} {4} {5} {6} {7} Dynamic equivalence {1} {2,3} {4} {5} {6} {7} {1} {2,3,4} {5} {6} {7} {1} {2,3,4} {5,6} {7} {1} {2,3,4,5,6} {7} Operations find(i) return the name of the set containing i. union(i,j) joins the sets containing i and j. union(2,3) union(3,4) union(5,6) union(6,3)
Union Find
The UnionFind interface class UnionFind { UnionFind(int n) {... }; int find(int i) {... }; void union(int i, int j) {... }; } To simplify matters, use integers {0,1,2,…,n} to represent the set elements.
Implementing Union-Find A key question: How should we represent the equivalence classes? Let’s consider a naïve approach first, and then a better way…
A naïve array representation Array with set indexes sets: {0,1,3}, {2,5,8}, {4,6}, {7} union(1,4) yields: sets: {0,1,3,4,6}, {2,5,8}, {7}
Running time for naïve approach With this naïve representation, find(n) runs in O(1) time. What about union(n,m)?
Forest and tree representation Each set is a tree {1}{2}{0,3} {4}{5} union(2,1) adds a new subtree to a root {1,2}{0,3}{4}{5} union(0,1) adds a new subtree to a root {1,2,0,3}{4}{5} demo
{1,2,0,3}{4}{5} find(2) = 1 find(4) = 4 Array representation Forest and trees: array repn
Find, v {1,2,0,3}{4}{5} find(0) = 1 s: public int find(int x) { if (s[x] < 0) return s[x]; return find(s[x]); }
Union, v {1,2}{0,3}{4}{5} {1,2,0,3}{4}{5} union(0,2) s: before s’: after public void union(int x, int y){ s[find(x)] = y; }
Union, v {1,2}{0,3}{4}{5} {1,2,0,3}{4}{5} union(0,2) s: before s’: after public void union(int x, int y){ s[find(x)] = find(y); }
Union v.0 is still O(n)! Find must walk the path to the root Unlucky combinations of unions can result in long paths
Trick 1: union by height union shallow trees into deep trees Tree depth increases only when depths equal Track path length to root Tree depth at most O(log 2 N)
Trick 1’: union by size union small trees into big trees (Tree size always increases) Track subtree size Tree depth at most ???
Trick 2: Path compression find flattens trees Redirect nodes to point directly to the root Example: find(0) Do this whenever traversing a path from node to root
Path compression find flattens trees Redirect nodes to point directly to the root Do this whenever traversing a path from node to root. public int find(int x) { if (s[x]< 0) return x; return s[x] = find(s[x]); } This implies that union does path compression (through its calls to find)
The Code
All the code class UnionFind { int[] u; UnionFind(int n) { u = new int[n]; for (int i = 0; i < n; i++) u[i] = -1; } int find(int i) { int j,root; for (j = i; u[j] >= 0; j = u[j]) ; root = j; while (u[i] >= 0) { j = u[i]; u[i] = root; i = j; } return root; } void union(int i,int j) { i = find(i); j = find(j); if (i !=j) { if (u[i] < u[j]) { u[i] += u[j]; u[j] = i; } else { u[j] += u[i]; u[i] = j; } }
The UnionFind class class UnionFind { int[] u; UnionFind(int n) { u = new int[n]; for (int i = 0; i < n; i++) u[i] = -1; } int find(int i) {... } void union(int i,int j) {... } }
Trick 2: Iterative find int find(int i) { int j, root; for (j = i; u[j] >= 0; j = u[j]) ; root = j; while (u[i] >= 0) { j = u[i]; u[i] = root; i = j; } return root; }
Trick 1 ’ : union by size void union(int i,int j) { i = find(i); j = find(j); if (i != j) { if (u[i] < u[j]) { u[i] += u[j]; u[j] = i; } else { u[j] += u[i]; u[i] = j; } }
Time bounds Variables M operations.N elements. Algorithms Simple forest representation Worst: find O(N). mixed operations O(MN). Average: tricky Union by height; Union by size Worst: find O(log N). mixed operations O(M log N). Average: mixed operations O(M) [see text] Path compression in find Worst: mixed operations: “nearly linear” [analysis in ]