15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

CSE 326: Data Structures Part 7: The Dynamic (Equivalence) Duo: Weighted Union & Path Compression Henry Kautz Autumn Quarter 2002 Whack!! ZING POW BAM!
Introduction to Algorithms Quicksort
Chapter Three: Closure Properties for Regular Languages
Lecture 19. Reduction: More Undecidable problems
Lecture 24 MAS 714 Hartmut Klauck
Generalization and Specialization of Kernelization Daniel Lokshtanov.
1 Disjoint Sets Set = a collection of (distinguishable) elements Two sets are disjoint if they have no common elements Disjoint-set data structure: –maintains.
CS38 Introduction to Algorithms Lecture 5 April 15, 2014.
EECS 311: Chapter 8 Notes Chris Riesbeck EECS Northwestern.
Union-Find: A Data Structure for Disjoint Set Operations
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
CSE 326: Data Structures Disjoint Union/Find Ben Lerner Summer 2007.
Disjoint Union / Find CSE 373 Data Structures Lecture 17.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
CSE 326: Data Structures Disjoint Union/Find. Equivalence Relations Relation R : For every pair of elements (a, b) in a set S, a R b is either true or.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
CIS 310: Visual Programming, Spring 2006 Western State College 310: Visual Programming Othello.
Lecture 16: Union and Find for Disjoint Data Sets Shang-Hua Teng.
CSE 373, Copyright S. Tanimoto, 2002 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
Data Structures and Algorithms Graphs Minimum Spanning Tree PLSD210.
Mathematical Preliminaries Strings and Languages Preliminaries 1.
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Aaron Bauer Winter 2014.
CSE373: Data Structures & Algorithms Lecture 10: Disjoint Sets and the Union-Find ADT Lauren Milne Spring 2015.
CMSC 341 Disjoint Sets. 8/3/2007 UMBC CMSC 341 DisjointSets 2 Disjoint Set Definition Suppose we have an application involving N distinct items. We will.
CMSC 341 Disjoint Sets Textbook Chapter 8. Equivalence Relations A relation R is defined on a set S if for every pair of elements (a, b) with a,b  S,
Disjoint Sets Data Structure. Disjoint Sets Some applications require maintaining a collection of disjoint sets. A Disjoint set S is a collection of sets.
Union-find Algorithm Presented by Michael Cassarino.
Chapter 10 Graph Theory Eulerian Cycle and the property of graph theory 10.3 The important property of graph theory and its representation 10.4.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Nicki Dell Spring 2014.
CSE373: Data Structures & Algorithms Lecture 10: Implementing Union-Find Dan Grossman Fall 2013.
Fundamental Data Structures and Algorithms Peter Lee April 24, 2003 Union-Find.
Mathematical Preliminaries
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Disjoint Sets.
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 21 April 2005 Equivalence and Union-Find.
1 The Disjoint Set ADT CS146 Chapter 8 Yan Qing Lei.
1 Today’s Material The dynamic equivalence problem –a.k.a. Disjoint Sets/Union-Find ADT –Covered in Chapter 8 of the textbook.
CS 146: Data Structures and Algorithms July 16 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak
CHAPTER 8 THE DISJOINT SET ADT §1 Equivalence Relations 【 Definition 】 A relation R is defined on a set S if for every pair of elements (a, b), a, b 
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Chapter Relations and Their Properties
CMSC 341 Disjoint Sets. 2 Disjoint Set Definition Suppose we have N distinct items. We want to partition the items into a collection of sets such that:
MST, Topological Sort and Disjoint Sets
CSE373: Data Structures & Algorithms Lecture 9: Disjoint Sets and the Union-Find ADT Lauren Milne Summer 2015.
Chapter 8: Relations. 8.1 Relations and Their Properties Binary relations: Let A and B be any two sets. A binary relation R from A to B, written R : A.
CSE 311 Foundations of Computing I Lecture 28 Computability: Other Undecidable Problems Autumn 2011 CSE 3111.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Graph Connectivity This discussion concerns connected components of a graph. Previously, we discussed depth-first search (DFS) as a means of determining.
WEEK 5 The Disjoint Set Class Ch CE222 Dr. Senem Kumova Metin
C Point Version July 2003 By Andrea Kohlhase, The Course Capsules Project (CCaps),CMU Look at our website CCaps for latest news!CCaps C ategorize.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
CSE 373, Copyright S. Tanimoto, 2001 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
CSE 326: Data Structures: Set ADT
Disjoint Sets Data Structure
Chapter 8 Disjoint Sets and Dynamic Equivalence
Disjoint Sets Chapter 8.
CSE373: Data Structures & Algorithms Lecture 10: Disjoint Sets and the Union-Find ADT Linda Shapiro Spring 2016.
CMSC 341 Disjoint Sets Based on slides from previous iterations of this course.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Linda Shapiro Spring 2016.
Minimal DFA Among the many DFAs accepting the same regular language L, there is exactly one (up to renaming of states) which has the smallest possible.
On the effect of randomness on planted 3-coloring models
CSE373: Data Structures & Algorithms Implementing Union-Find
Minimum Spanning Trees
Disjoint Sets Textbook Chapter 8
Data Structures and Algorithms
Disjoint Set Operations: “UNION-FIND” Method
Presentation transcript:

Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find

Announcements  HW7 make sure to get your games in...  Quiz 3: Thursday April 22  Final Exam on Tuesday May 4, 5:30 pm  review session April 29

Chameleon Island

On a tropical island there are three kinds of chameleons perambulating themselves: red, green and blue. If a red and green chameleon meet, they both change color to blue, likewise for red/blue and green/blue. Initially there are 12 red, 13 green and 14 blue chameleons. Can the chameleons turn into a homogeneous population?

Brute Force We can compute this to death: use a digraph with nodes (r,g,b) and edges (r,g,b)  (r-1,g-1,b+2) (r,g,b)  (r-1,g+2,b-1) (r,g,b)  (r+2,g-1,b-1) provided that the numbers are non-negative. How many nodes are there? The starting configuration is (12,13,14) so the total number of animals is n = 39.

Reachability The number of nodes is C(39+2,2) = 820. We can simply use DFS or BFS to compute the nodes reachable from (12,13,14) and check if we run into one of (39,0,0), (0,39,0), (0,0,39). It turns out, we don't. OK but rather crude. Is there a more elegant solution? How about invariants?

Invariants If we suspect that some configuration cannot occur, we can try to prove this by finding some property P such that: - P holds on the initial configuration, - P is preserved in every single transition of the system, - P does not hold on the specific target configuration. Your favorite method: Induction.

Information Hiding For the chameleons, the key observation is that modulo 3 the three types of edges are all the same: (r,g,b)  (r+2,g+2,b+2) mod 3 Note that this quotient operation preserves paths, so it suffices to observe (0,1,2)  (2,0,1)  (1,2,0)  (0,1,2) Of course, we lose a lot of information but this is enough to answer the original question.

Equivalence Relations There is an important idea hiding here: identify objects that are distinct but share some property. Modeled by a binary relation ~ on some carrier set A. reflexivex ~ x symmetricx ~ y  y ~ x transitivex ~ y  y ~ z  x ~ z

Examples congruence modulo m polygons of same area people of same age reachable in a undirected graph programs with same input/output behavior

Classes and Quotients Equivalence class of x: [x] = { y | y ~ x } Quotient: A/~ = { [x] | x in A } Index of ~: cardinality of A/~ Note that equivalence classes form a partition of A. In fact, partitions and equivalence relations are essentially the same.

Kernel Relations Given any function f : A  B we can form a relation K(f) by defining x K(f) y iff f(x) = f(y). Note that K(f) is always an equivalence relation. If R = K(f) we say that f is a (kernel) representation for R. (As opposed to list of pairs, adjacency matrix, adjacency matrix, … ).

Everybody is a Kernel Claim: All equivalence relations are of this form. In fact we can choose a function f : A  A. This is intuitively clear: we map all x in an equivalence class to some special member of that class. (Take a course in set theory if you want to know why there are problems with this.)

Computational Aspects The last observation allows us to represent an equivalence relation on [n] = {1,2,...,n} compactly: Instead of n 2 bits for a Boolean matrix representation we only need n integers for an array representing f. We can still check if two elements are equivalent in O(1) time. What is a good choice for the function f?

The Canonical Representation f(x) = min( z | z ~ x ) For example, if x ~ y iff x = y mod 3 on [10] we get x f(x)

Index x f(x) Question: How does one compute the index of R from a kernel representation for R?

Refinement Suppose we have two equivalence relations R and S on [n] both given by their canonical kernel function. How do we compute their intersection x int(R,S) y iff x R y and x S y In other words, we want to compute the canonical representation for T = int(R,S). Example R S T

Code initialize H hashmap; for x = 1,...,n do if( (R[x],S[x]) is undefined ) then T[x] = H( (R[x],S[x]) ) = x; else T[x] = H( (R[x],S[x]) ) Expected linear time. Could also replace H by a n  n array (interesting if the initialization cost can be amortized).

Small Machines

Recall: Finite State Machines Recall that a finite state machine is essentially a lookup table with one entry for each symbol/state combination, plus an initial state and some final states.

An Experiment Think of the finite state machine as a black box. Suppose you can perform the following experiment as often as you wish: - reset the machine to some state p, - feed some string to the machine, and - observe whether the resulting state is final. Of course, you are not allowed to open up the machine. Which states could be distinguished from each other by this experiment?

A Black Box Call p and q (behaviorally) equivalent if they cannot be distinguished. Claim: 1. We can distinguish final from non-final states. 2. If we can distinguish p and q and d(p',a) = p and d(q',a) = q then we can also distinguish p' and q'.

Who Cares? If two states are equivalent, we may as well collapse them into a single state. More precisely, we can replace the state set Q by Q/~. The latter may be much smaller, so we can build potentially smaller machines. Fact: One can show that the smallest possible finite state machine (for a given language) can be obtained this way.

Example a b b a,b a b a

Computing Behavioral Equiv. How do we actually compute the behavioral equivalence relation ~? Refine partitions. Initially only distinguish between F and Q – F. Then refine the partition as follows: Suppose we have an equivalence relation E. Define E' by p E' q iff p E q and for all symbols s: d(p,s) E d(q,s).

Computing Behavioral Equiv. But that's just a intersection operation: Define p E s q iff d(p,s) E d(q,s). Then E' = int( E, E a, E b,... ). When E' = E for the first time we have E = ~. Can be computed in O( k n 2 ) steps where n is the number of states and k the number of input symbols.

Example 1 a b b a,b a init a b a b

Dynamic Equivalence Relations

Recall: Mazes  Think about a grid of rooms separated by walls.  Each room can be given a name. abcd hgfe ijkl ponm Randomly knock out walls until we get a good maze.

The Party Problem You arrive at a party. As usual, there are separate groups of people standing around. In each group people talk to each other, but they don't talk to anyone outside of the group. You scan the groups, find someone that you know and join the corresponding group. If someone in another group knows you too, the two groups merge. How do we figure out the groups given a list of “is- friend-of” relations. The list is revealed step by step, we don't have access to the whole list from the start.

Dynamic E-Relations So far we have only dealt with static equivalence relations: the whole relation is given from the start and we can represent it by the canonical kernel function. Often that is not the case: all we have is knowledge about some equivalent pairs (x,y) of elements. The corresponding equivalence relation is thus given implicitely. This is really a closure problem: we have some (arbitrary) relation R and we want to compute the least equivalence relation eqc(R) that contains R.

Say What? R is arbitrary. We want S such that - x R y implies x S y - S is reflexive, symmetric and transitive - S is the coarsest such relation. Thus x S y only if this is forced by R and the equivalence condition. We do not frivolously identify elements.

Transitivity Making S reflexive and symmetric is no problem: we can just make R reflexive and symmetric. The difficult part is transitivity: Whenever there is a chain x 1 R x 2 R x 3... x n-1 R x n we need to set x 1 S x n.

Static is Easy If R is static this an old problem: Think of R as a graph and use DFS/BFS or Warshall. But what to do when the pairs in R pop up one after the other?

Kernel Schmernel Suppose we have the canonical kernel representation f for S. If we get another pair (x,y), how can we update S? If already f(x) = f(y) we're OK. But otherwise we have to scan the whole array to update the the entries affected by setting x equivalent to y. Takes time linear in n. Problem: Our representation is too uptight.

Fixed Points We need to relax the conditions on f a little. But how? Let FP(f,x) be the element z such that f(z) = z f k (x) = z for some k. Needless to say, fixed points do not exist in general, but we will make sure that f is constructed properly so that there is no problem.

FP versus EQ Let's say that f represents relation R if x R y iff FP(f,x) = FP(f,y). Clearly R has to be an equivalence relation. Note that the canonical kernel function would work here. But the whole point is that many other functions also work. And that makes it much easier to update. Also note: a query “x R y?” is no longer O(1) but a priori only O(n).

Testing Equivalence To test whether x is equivalent to y we do x' = FP(f,x); y' = FP(f,y); return ( x' == y' ); Running time is clearly O(n). But if we use a “good” f it can be close to O(1).

Updating Equivalence Suppose we are told that x is equivalent to y. To update, do the following: x' = FP(f,x); y' = FP(f,y); if( x' != y' ) then f[x'] = y' or f[y'] = x'; Picking the right alternative will be important for running time.

Union-Find In the world of programming the key operations are called - find(x) return the fixed point - union(x,y) union the classes of x and y So far, this is clever but not too exciting: both operations may be linear in n. We need to be more careful about how to perform the union operation. Note that our definition of representation gives us a lot of leeway.

Example {1} {2} {3} {4} {5} {6} {7} {1} {2,3} {4} {5} {6} {7} {1} {2,3,4} {5} {6} {7} {1} {2,3,4} {5,6} {7} {1} {2,3,4,5,6} {7} union(2,3) union(3,4) union(5,6) union(6,3) {1} {2,3,4,5,6} {7} union(2,6)

Think Tree It is helpful to think of the representing function f as a rooted tree

Keeping the Trees Shallow If we think of f as a collection of rooted trees it is natural to try to keep the depth of these trees small. Several plausible strategies: Union by depth: attach more shallow tree to deeper one. Union by size: attach smaller tree to larger one.

A Trick: Path Compression Since we have to traverse a path from a node to the root we might as well smash all the nodes on that path up to the root. E.g., find(0) would produce:

How Hard to Implement? One might wonder how hard it is to code all these tricks (without union by size/depth and path compressions the code is nearly trivial). Also, what is the actual payoff in the end? As it turns out, the code is really simple, and the payoff is tremendous.

The Code

All the code class UnionFind { int[] u; UnionFind(int n) { u = new int[n]; for (int i = 0; i < n; i++) u[i] = -1; } int find(int i) { int j,root; for (j = i; u[j] >= 0; j = u[j]) ; root = j; while (u[i] >= 0) { j = u[i]; u[i] = root; i = j; } return root; } void union(int i,int j) { i = find(i); j = find(j); if (i !=j) { if (u[i] < u[j]) { u[i] += u[j]; u[j] = i; } else { u[j] += u[i]; u[i] = j; } }

The UnionFind class class UnionFind { int[] u; UnionFind(int n) { u = new int[n]; for (int i = 0; i < n; i++) u[i] = -1; } int find(int i) {... } void union(int i,int j) {... } }

Iterative find int find(int i) { int j, root; for (j = i; u[j] >= 0; j = u[j]); root = j; while (u[i] >= 0) { j = u[i]; u[i] = root; i = j; } return root; }

union by size void union(int i,int j) { i = find(i); j = find(j); if (i != j) { if (u[i] < u[j]) { u[i] += u[j]; u[j] = i; } else { u[j] += u[i]; u[i] = j; } }

Time bounds  Variables  M operations.N elements.  Algorithms  Simple forest representation Worst: find O(N). mixed operations O(MN). Average: tricky  Union by height; Union by size Worst: find O(log N). mixed operations O(M log N). Average: mixed operations O(M) [see text]  Path compression in find Worst: mixed operations: “nearly linear” [analysis in ]