Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Disjoint Sets.

Slides:



Advertisements
Similar presentations
CSE 326: Data Structures Part 7: The Dynamic (Equivalence) Duo: Weighted Union & Path Compression Henry Kautz Autumn Quarter 2002 Whack!! ZING POW BAM!
Advertisements

Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
1 Disjoint Sets Set = a collection of (distinguishable) elements Two sets are disjoint if they have no common elements Disjoint-set data structure: –maintains.
EECS 311: Chapter 8 Notes Chris Riesbeck EECS Northwestern.
Chapter 8: The Disjoint Set Class Equivalence Classes Disjoint Set ADT CS 340 Page 132 Kruskal’s Algorithm Disjoint Set Implementation.
Union-Find: A Data Structure for Disjoint Set Operations
CSE 326: Data Structures Disjoint Union/Find Ben Lerner Summer 2007.
Disjoint Union / Find CSE 373 Data Structures Lecture 17.
CSE 326: Data Structures Disjoint Union/Find. Equivalence Relations Relation R : For every pair of elements (a, b) in a set S, a R b is either true or.
Heaps Heaps are used to efficiently implement two operations:
Dynamic Sets and Data Structures Over the course of an algorithm’s execution, an algorithm may maintain a dynamic set of objects The algorithm will perform.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 17 Union-Find on disjoint sets Motivation Linked list representation Tree representation.
Lecture 9 Disjoint Set ADT. Preliminary Definitions A set is a collection of objects. Set A is a subset of set B if all elements of A are in B. Subsets.
Lecture 16: Union and Find for Disjoint Data Sets Shang-Hua Teng.
1 Chapter 8 The Disjoint Set ADT Concerns with equivalence problems Find and Union.
CSE 373, Copyright S. Tanimoto, 2002 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
CS2420: Lecture 42 Vladimir Kulyukin Computer Science Department Utah State University.
Data Structures & Algorithms Union-Find Example Richard Newman.
Trees Featuring Minimum Spanning Trees HKOI Training 2005 Advanced Group Presented by Liu Chi Man (cx)
Spring 2015 Lecture 11: Minimum Spanning Trees
1 22c:31 Algorithms Union-Find for Disjoint Sets.
Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.
Chapter 21 Priority Queue: Binary Heap Saurav Karmakar.
CS 473Lecture X1 CS473-Algorithms I Lecture X1 Properties of Ranks.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Aaron Bauer Winter 2014.
CSE373: Data Structures & Algorithms Lecture 10: Disjoint Sets and the Union-Find ADT Lauren Milne Spring 2015.
CMSC 341 Disjoint Sets. 8/3/2007 UMBC CMSC 341 DisjointSets 2 Disjoint Set Definition Suppose we have an application involving N distinct items. We will.
CMSC 341 Disjoint Sets Textbook Chapter 8. Equivalence Relations A relation R is defined on a set S if for every pair of elements (a, b) with a,b  S,
Lecture X Disjoint Set Operations
Union-find Algorithm Presented by Michael Cassarino.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Nicki Dell Spring 2014.
CSE373: Data Structures & Algorithms Lecture 10: Implementing Union-Find Dan Grossman Fall 2013.
Fundamental Data Structures and Algorithms Peter Lee April 24, 2003 Union-Find.
ICS 353: Design and Analysis of Algorithms Heaps and the Disjoint Sets Data Structures King Fahd University of Petroleum & Minerals Information & Computer.
1 The Disjoint Set ADT CS146 Chapter 8 Yan Qing Lei.
1 Today’s Material The dynamic equivalence problem –a.k.a. Disjoint Sets/Union-Find ADT –Covered in Chapter 8 of the textbook.
Minimum Spanning Trees Featuring Disjoint Sets HKOI Training 2006 Liu Chi Man (cx) 25 Mar 2006.
Union-Find A data structure for maintaining a collection of disjoint sets Course: Data Structures Lecturers: Haim Kaplan and Uri Zwick January 2014.
CS 146: Data Structures and Algorithms July 16 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak
0 Union-Find data structure. 1 Disjoint set ADT (also Dynamic Equivalence) The universe consists of n elements, named 1, 2, …, n n The ADT is a collection.
CHAPTER 8 THE DISJOINT SET ADT §1 Equivalence Relations 【 Definition 】 A relation R is defined on a set S if for every pair of elements (a, b), a, b 
CMSC 341 Disjoint Sets. 2 Disjoint Set Definition Suppose we have N distinct items. We want to partition the items into a collection of sets such that:
CSE373: Data Structures & Algorithms Lecture 9: Disjoint Sets and the Union-Find ADT Lauren Milne Summer 2015.
WEEK 5 The Disjoint Set Class Ch CE222 Dr. Senem Kumova Metin
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CSE 373, Copyright S. Tanimoto, 2001 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
CSE 373: Data Structures and Algorithms
CSE 373, Copyright S. Tanimoto, 2001 Up-trees -
Chapter 8 Disjoint Sets and Dynamic Equivalence
Disjoint Sets Chapter 8.
CSE373: Data Structures & Algorithms Lecture 10: Disjoint Sets and the Union-Find ADT Linda Shapiro Spring 2016.
CMSC 341 Disjoint Sets Based on slides from previous iterations of this course.
Disjoint Set Neil Tang 02/23/2010
Disjoint Set Neil Tang 02/26/2008
CSE 332: Data Structures Disjoint Set Union/Find
ICS 353: Design and Analysis of Algorithms
CSE 373 Data Structures and Algorithms
Data Structures & Algorithms Union-Find Example
CSE 332: Data Abstractions Union/Find II
ICS 353: Design and Analysis of Algorithms
Equivalence Relations
CSE373: Data Structures & Algorithms Implementing Union-Find
Union-Find.
CMSC 341 Disjoint Sets.
Disjoint Sets DS.S.1 Chapter 8 Overview Dynamic Equivalence Classes
CMSC 341 Disjoint Sets.
Disjoint Sets Textbook Chapter 8
CSE 373: Data Structures and Algorithms
Disjoint Set Operations: “UNION-FIND” Method
Presentation transcript:

Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Disjoint Sets

Preliminaries on Set In mathematics, two sets are said to be disjoint  If they have no element in common  {1, 2, 3} and {4, 5, 6} are disjoint sets Formally, two sets A and B are disjoint if their intersection is the empty set, i.e., if  The disjoint set definition extends to any collection of sets  A collection of sets is pairwise disjoint or mutually disjoint if, given any two sets in the collection, those two sets are disjoint

Preliminaries on Set (cont’d) For example, the collection of sets { {1}, {2}, {3},... } is pairwise disjoint  {A i } is a pairwise disjoint collection (containing at least two sets), then clearly its intersection is empty However, the converse is not true:  The intersection of the collection {{1, 2}, {2, 3}, {3, 1}} is empty, i.e., disjoint  The collection is not pairwise disjoint  In fact, there are no two disjoint sets in this collection A partition of a set X is any collection of non-empty subsets {A i : i ∈ I} of X such that {A i } are pairwise disjoint and

Example A ={3}; B= {1, 3, 5} and C= {4, 5, 6} Are A, B and C disjoint?  Yes Are A, B and C pairwise disjoint?  No  The intersection of A and B is not empty  The intersection of B and C is not empty

Disjoint Sets Data structure for problems requiring equivalence relations  That is, are two elements in the same equivalence class Disjoint sets provide a simple, fast solution  Simple: Array-based implementation  Fast: O(1) per operation average-case Analysis is challenging

Relation and Equivalence Relation An relation R is defined on a set S, if for every pair of elements (a, b) where a, b in S  (a R b) is either true or false An equivalence relation is a relation R that satisfies three properties  (Reflexive) (a R a), for each element a in S  (Symmetric) (a R b) if and only if (b R a)  (Transitive) (a R b) and (b R c) implies (a R c) The equivalence class of an element a (in S) is the subset of S that contains all elements related to a

Equivalence Relation (cont’d) Examples  Equality (=) over integers (Reflexive) (a = a) for all integers a (Symmetric) (a = b) iff (b = a) (Transitive) (a = b) and (b = c) implies (a = c)  Electrical connectivity (Reflexive) A component is connected to itself (Symmetric) If a is electrically connected to b, then b is electrically connected to a (Transitive) If a is connected to b and b is connected to c, then a is connected to c  Cities belonging to the same country (if all the roads are two-way)

Equivalence Relation (cont’d) Not an equivalence relation   (Reflexive) (a  a), for all a (Transitive) If (a  b) and (b  c) then (a  c) (NOT Symmetric) (a  b) DOES NOT IMPLY (b  a)

Equivalence Class Given a set S and equivalence relation R Find the subsets S i of S such that  For all a, b  S i : (a R b)  For all a  S i, b  S j, i  j: not(a R b) These S i are the equivalence classes of S for relation R  The S i are “disjoint sets” and form a partition of S Example: S = {1,2,3,4,3,3,2,1,3}, R is the equality (=)  Find out the equivalence classes

A Motivating Problem for Disjoint Sets Given a set S of n elements [a 1, …, a n ], compute all the equivalent class of all its elements

Properties of Equivalence Classes OBSERVATION:  Each element has to belong to exactly one equivalence class COROLLARY:  All equivalence classes are mutually disjoint What we are after is the set of all “maximal” equivalence classes

Disjoint Set Operations To identify all equivalence classes 1. Initially, put each element in a set of its own 2. Permit only two types of operations:  find(x): Returns the equivalence class of x  union(x, y): Merges the equivalence classes corresponding to elements x and y, if and only if x is “related” to y

Steps in the union(x, y) 1. EqClass x = find(x) 2. EqClass y = find(y) 3. EqClass xy = EqClass x  EqClass y Union if x is “related” to y

A Naïve Algorithm for Equivalence Class Computation 1. Initially, put each element in a set of its own  That is, EqClass a = {x}, for every x  S. 2. For each element pair (x, y) 1. Check if (x R y) is true 2. If (x R y) is true then 1. EqClass x = find(x) 2. EqClass y = find(y) 3. EqClass xy = EqClass x  EqClass y union(x, y)

Disjoint Sets Example: S = {1,2,3,4,3,3,2,1,3}, R is the equality (=)  S = {1 a, 2 a, 3 a, 4 a, 3 b, 3 c, 2 b, 1 b, 3 d }  DS = { {1 a }, {2 a }, {3 a }, {4 a }, {3 b }, {3 c }, {2 b }, {1 b }, {3 d } }  3 a R 3 b ?, 3 c R 3 d ?  DS = { {1 a }, {2 a }, {3 a,3 b }, {4 a }, {3 c,3 d }, {2 b }, {1 b } }  3 a R 3 c ?  DS = { {1 a }, {2 a }, {3 a,3 b,3 c,3 d }, {4 a }, {2 b }, {1 b } }  Continue like this…

Specification for Union-Find find(x)  Should return the ID of the equivalence class that currently has element x union(x, y)  If x and y are in two different equivalence classes, then union(x, y) should merge those two sets into one Otherwise, no change

Disjoint Sets Represent each set as a tree Tree’s root is the representative element for the set Disjoint sets are a forest of trees find(x) returns the root element of the tree containing x union(x, y) points root node of tree containing y to root node of tree containing x Implemented as array S, where S[i] = index of parent node in tree (or -1 if root)

Example Initial disjoint sets of 8 elements (really an array of size 8 of all -1) Initially, each element is put in one set of its own (start with n sets == n trees) After union(4, 5): points root node of tree containing 5 to root node of tree containing 4

Example (cont’d) After union(6, 7): After union(4, 6): Convention is that new root after union (x,y) is x The array representation:

How to Support union() and find() Operations Efficiently? Approach 1  Keep the elements in the form of an array A, where A[i] stores the ID of the equivalence set it is currently in Analysis  find(x) will return the root Generally constant time O(depth of the node X) Worst case running time is O(n) since depth is n-1  union(x, y) could take up to O(n) time Assuming x in class a and y in class b Scan array, changing all a’s to b’s (class label)  Therefore, a sequence of m operations could take O(mn) in the worst case

How to Support union() and find() Operations Efficiently? (cont’d) Approach 2  Keep all equivalence sets in separate linked list Analysis  Decreases time for unions by not having to search all N elements Just the two lists where the elements are found And then concatenate lists: O(size of larger list) union(x, y) now needs only O(1) time (assume doubly-linked list)  Increases time to find an element find(x) could take up O(n) time Slight improvements are possible (think of BSTs)  A sequence of m operations will take  (m log n)

How to Support union() and find() Operations Efficiently? (cont’d) Approach 3  Keep all equivalence classes in separate trees  Ensure (somehow) that find() and union() take << O(log n) time  This is the data structure we have used for disjoint sets A disjoint sets for n elements is a forest of k trees, where 1  k  n

The Disjoint Set Data Structure Purpose: To support two basic operations efficiently:  find(x)  union(x, y) Input: An array of n elements Identify each element by its array index  Element label is equal to array index  Value does not really matter

Implementation Note: This will always be a vector, regardless of the data type of your elements. Initial # of disjoint sets

Implementation (cont’d) The array representation: Entry s[i] “points” to ith parent. s[i] == -1 means i is root.

Implementation (cont’d)

Set Name 3 Set Name 4

Implementation (cont’d) Union performed arbitrarily. This could also be s[root1] = root2; Set Name 4 Set Name 6

Implementation (cont’d) // a & b could be arbitrary elements (need not be roots) void DisjSets::union(int a, int b) { unionSets(find(a), find(b)); }

Analysis of the Simple Version Each find(x) could take O(n) time  Proportional to depth of tree containing x Each union(x, y) could also take O(n) time Each unionSets() takes only O(1) time in the worst- case Therefore, m operations, where m >> n, would take O(mn) time in the worst-case

Smart Union Algorithms Problem with the arbitrary union strategy in the simple approach is that:  The tree, in the worst-case, could just grow along one long O(n) path Solution: Prevent formation of such long chains  Enforce union() to happen in a “balanced” way Two heuristics  Union-by-size  Union-by-height

Union-by-Size Attach the root of the “smaller” tree to the root of the “larger” tree w.r. t size union(3, 7) So, attach root 3 to root 4. size = 1 size = 4

Union-by-Size (cont’d) Result of union(3, 7) using union-by-size heuristic Result of union(3, 7) using simple union Using simple union could end up unbalanced like this. Size is 5 Size is 1

Union-by-Size (cont’d) Link smaller tree to larger tree Sequence of m operations requires O(m) time  Random unions tend to merge large sets with small sets  Thus, only increase depth of smaller set Implementation  Use “– size” instead of “-1” for root entries The array representation: Size is 1 Size is 5

Union-by-Height Attach the root of the “shallower” tree to the root of the “deeper” tree union(3, 7)So, attach root 3 to root 4. height = 0 height = 2

Union-by-Height (cont’d) Need to keep track of the height of each tree, rather than size Link smaller-height tree to larger-height tree Height only increases when two equal-height trees are unioned O(log n) maximum depth O(m) time for m operations Implementation  Store negative of height minus 1 for root entries

Union-by-Height (cont’d) The array representation: Height is 0 Height is 2

Union-By-Height Implementation

Analysis of Smart Union Heuristics For smart union (by rank or by size)  find() takes O(log n) time  union() takes O(log n) time  unionSets() takes O(1) time For m operations: O(m log n) time Can we do better?  What is still causing the (log n) factor is the distance of the nodes from the root  Idea: Get the nodes as close as possible to the root  The above idea is called path compression

Path Compression During find(x) operation  Update all the nodes along the path from x to the root point directly to the root  A two-pass algorithm  For example: find(x) root x 1 st pass 2 nd pass

Path Compression Example After find(14):

Path Compression Implementation s[x] is made equal to the value returned by find(x) X’s parent link references to the root of the set Occurs recursively to the every node on the path to the root

Path Compression with Smart Union Path compression works as is with union-by-size (tree sizes don’t change) Path compression with union-by-height requires re- computation of heights Solution: Don’t re-compute the heights  Heights become (possibly over) estimates of true height  Also called “ranks” and this solution is called “union-by-rank”  Ranks are modified far less than sizes, so slightly faster in practice Path compression does not change average case time, but does reduce worst-case time

Analysis of Union-by-Rank and Path Compression Worst case is Θ(M  (M,N))  M is number of operations (find, union)  N is number of elements in disjoint set   (M,N) is the inverse of Ackermann’s function In practice,  (M,N) ≤ 4 Thus, worst case is Θ(M) for M operations

Ackermann’s Function A(i,j)j=1j=2j=3j=4 i=12 1 = 22 2 = 42 3 = 82 4 = 16 i=22 2 = = = i= = = = BIG

Inverse of Ackermann’s Function

Analysis of Union-by-Rank and Path Compression Worst case is Θ(M  (M,N)) for M operations on disjoint set with N elements  But, technically not linear in M Any sequence of M = Ω(N) union/find operations takes O(M log*N) time

Heuristics and Their Gains Worst-case run-time for m operations Arbitrary union, simple findO(mn) Union-by-size, simple findO(m log n) Union-by-rank, simple findO(m log n) Arbitrary union, path compression find O(m log n) Union-by-rank, path compression find O(m log* n) log* n is pronounced “log star n”. log* n = log log log log … n log* n = how many times we have to repeatedly take log on n to make the value to 1? E.g. log* = 4, log* = 5 Therefore, log* n is an extremely slow growing f’n.

Application: Maze Generation Start with walls everywhere except upper-left corner and lower- right corner Randomly choose a wall that separates two disconnected cells  Knock it down if the cells that the wall separates are not already connected Continue until start (upper-left corner) and finish (lower-right corner) cells are connected Or, continue until all cells are connected  More dead ends Then we have a maze This is a 50-by-88 maze and top left-cell is connected to the bottom-right cell and cells are separated from their neighboring cells via wall.

Maze Generation Example Initial state: All walls up, all cells in their own set/equivalence class Use Union/Find to represent sets of cells that are connected to each other

Maze Generation Example (cont’d) Intermediate state: Few walls are knocked down Wall that connects cells {8} and {13} are randomly targeted {18} and {13} randomly targeted, perform two find operations Knock down the wall that separates them Combine via a union operation

Maze Generation Example (cont’d) After joining 13 and 18 from previous intermediate state:

Maze Generation Example (cont’d) Final state: All cells are connected

More Applications Finding the connected components of an undirected graph Computing shorelines of a terrain Molecular identification from fragmentation Image processing  Movie coloring H O C O O O

Summary Disjoint sets data structure provides simple, fast solution to equivalence problems  Array-based implementation  Average case O(1) time per operation Consider alternatives when a particular step is not totally specified  Simple Union, Union-by-Size, Union-by-Rank  Flexibility of Union helps to design efficient algorithms Path compression is one of the earliest forms of self- adjustment like splay trees, skew heaps  Simple algorithm with a not-so-simple worst case analysis