Lecture 9 Disjoint Set ADT
Preliminary Definitions A set is a collection of objects. Set A is a subset of set B if all elements of A are in B. Subsets are sets Union of two sets A and B is a set C which consists of all elements in A and B Two sets are mutually disjoint if they do not have a common element.
A partition of a set is a collection of subsets such that Union of all these subsets is the set itself Any two subsets are mutually disjoint Note down examples from board. Equivalence relations constitute a partition
Union and Find Operations Operations on partitions. Union Need to form union of two different sets of a partition Find Need to find out which set an element belongs to
Every set in the partition has a number. The numbers can be anything as long as different sets have distinct numbers. Find(a) returns the number of the set containing a. Can two different sets contain the same element?
Disjoint Set Data Structure The set is represented by the root of the tree. The number assigned to a set is the number of the root element. Every element has a number. Elements of a set are stored in a tree (not necessarily binary) Are the numbers distinct for different sets?
Find(a) returns the number of the root node of the tree containing a. Union operation makes one tree sub-tree of another Root of one tree becomes child of the root of another. Note down example from board.
Tree Representation Will use an array based representation, array S Let the elements be 1,2,….N S[j] contains the number for the parent of j S[j] = 0 if j is the root. Initially all trees are singletons Trees build up with unions. Note down example from board Note that we don’t use any pointers here.
Pseudo Code for Find Find(a) { If S[a] = 0, return a; else Find(S[a]); return; } Complexity?
Pseudo-Code for Union Union(root 1, root 2) { S[root2] = root1; } Complexity?
More Efficient Union Will improve the worst case find complexity to log N When we do a union operation make the smaller tree a subtree of the bigger one Thus the root of the smaller subtree becomes a child of the root of the bigger one. Note down example from board. Alternatively, union operation can be done by height as well Tree of lesser height is made subtree of the other. We consider only size here.
Array storage changes somewhat. If j is a root, S[j] = - size of tree rooted at j If j is not a root, S[j] = parent of j Why is S[j] not equal to the size of tree j, if j is a root? Initially, what is the content of array S? Note down example from board
Pseudo-Code for Union Union(root 1, root 2) { If S[root2] < S[root1], S[root1] = root2; else S[root2]=root1; } Complexity?
Pseudo Code for Find Find(a) { If S[a] < 0, return a; else Find(S[a]); return; }
Complexity Analysis for Find Operation If the depth of a node A increases, then the earlier tree consisting the node becomes a subtree of another. Since only a smaller tree becomes a subtree of another, total size of the combined tree must be at least twice the previous one consisting A. Each time depth of a node increases, the size of the tree increases by at least a factor of 2. At first every node has depth 0
Next time depth is 1, tree size is at least 2, depth is 2, tree size is at least 4… depth is k, tree size is at least 2 k We know that 2 k <= N Thus k <= log N Complexity of Find operation is O(log N) Complexity of any M operations is O(MlogN)
Path Compression Makes all operations almost linear in the worst case. Whenever you do Find(j) make S[k]=Find(j) for all elements on the path of j to the root, except the root. All nodes on the path of j now point to the root directly Later Find operation on these will have lower costs as their depths have been reduced. Any Find operation reduces the cost of future ones.
Pseudo Code for New Find Find(a) { If S[a] < 0, return a; else S[a]=Find(S[a]); return; }
Complexity Analysis Any M operations take O(Mlog * N) if M is (N) log * N is the number of times we take loglog….logN so as to get a number less than or equal to 1 (say base 2, even otherwise asymptotic order remains the same). log * N grows very slowly with N and is less than 4 or 5 for all practical values of N, log * 2 32 is less than 5! Thus the worst case complexity is linear for all practical purposes.
Reading Assignment Chapter 8, till section (i.e. section onwards can be omitted).