Lecture 16: Union and Find for Disjoint Data Sets Shang-Hua Teng.

Slides:



Advertisements
Similar presentations
1 Union-find. 2 Maintain a collection of disjoint sets under the following two operations S 3 = Union(S 1,S 2 ) Find(x) : returns the set containing x.
Advertisements

1 Disjoint Sets Set = a collection of (distinguishable) elements Two sets are disjoint if they have no common elements Disjoint-set data structure: –maintains.
© 2004 Goodrich, Tamassia Union-Find1 Union-Find Partition Structures.
Union-Find: A Data Structure for Disjoint Set Operations
Disjoint Sets Given a set {1, 2, …, n} of n elements. Initially each element is in a different set.  {1}, {2}, …, {n} An intermixed sequence of union.
CSE 326: Data Structures Disjoint Union/Find Ben Lerner Summer 2007.
Disjoint Union / Find CSE 373 Data Structures Lecture 17.
CSE 326: Data Structures Disjoint Union/Find. Equivalence Relations Relation R : For every pair of elements (a, b) in a set S, a R b is either true or.
Union-Find Problem Given a set {1, 2, …, n} of n elements. Initially each element is in a different set. {1}, {2}, …, {n} An intermixed sequence of union.
CPSC 411, Fall 2008: Set 7 1 CPSC 411 Design and Analysis of Algorithms Set 7: Disjoint Sets Prof. Jennifer Welch Fall 2008.
Minimum Spanning Trees (MST)
CPSC 311, Fall CPSC 311 Analysis of Algorithms Disjoint Sets Prof. Jennifer Welch Fall 2009.
© 2004 Goodrich, Tamassia Union-Find1 Union-Find Partition Structures.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 17 Union-Find on disjoint sets Motivation Linked list representation Tree representation.
CSE 373, Copyright S. Tanimoto, 2002 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
CS2420: Lecture 42 Vladimir Kulyukin Computer Science Department Utah State University.
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Kruskal’s algorithm for MST and Special Data Structures: Disjoint Sets
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck.
Chapter 6: Union-Find and Related Structures CS6310 ADVANCED DATA STRUCTURE SHADHA MUHI & HASNAA IMAD.
Union-Find Problem Given a set {1, 2, …, n} of n elements. Initially each element is in a different set.  {1}, {2}, …, {n} An intermixed sequence of.
Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.
Homework remarking requests BEFORE submitting a remarking request: a)read and understand our solution set (which is posted on the course web site) b)read.
CMSC 341 Disjoint Sets. 8/3/2007 UMBC CMSC 341 DisjointSets 2 Disjoint Set Definition Suppose we have an application involving N distinct items. We will.
CMSC 341 Disjoint Sets Textbook Chapter 8. Equivalence Relations A relation R is defined on a set S if for every pair of elements (a, b) with a,b  S,
Disjoint Sets Data Structure (Chap. 21) A disjoint-set is a collection  ={S 1, S 2,…, S k } of distinct dynamic sets. Each set is identified by a member.
Lecture X Disjoint Set Operations
Disjoint Sets Data Structure. Disjoint Sets Some applications require maintaining a collection of disjoint sets. A Disjoint set S is a collection of sets.
Union-find Algorithm Presented by Michael Cassarino.
Union Find ADT Data type for disjoint sets: makeSet(x): Given an element x create a singleton set that contains only this element. Return a locator/handle.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Nicki Dell Spring 2014.
CSE373: Data Structures & Algorithms Lecture 10: Implementing Union-Find Dan Grossman Fall 2013.
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Disjoint Sets.
CSCE 411H Design and Analysis of Algorithms Set 7: Disjoint Sets Prof. Evdokia Nikolova* Spring 2013 CSCE 411H, Spring 2013: Set 7 1 * Slides adapted from.
Union & Find Problem 황승원 Fall 2010 CSE, POSTECH 2 2 Union-Find Problem Given a set {1, 2, …, n} of n elements. Initially each element is in a different.
Disjoint-Set Operation. p2. Disjoint Set Operations : MAKE-SET(x) : Create new set {x} with representative x. UNION(x,y) : x and y are elements of two.
CS 146: Data Structures and Algorithms July 16 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak
0 Union-Find data structure. 1 Disjoint set ADT (also Dynamic Equivalence) The universe consists of n elements, named 1, 2, …, n n The ADT is a collection.
CMSC 341 Disjoint Sets. 2 Disjoint Set Definition Suppose we have N distinct items. We want to partition the items into a collection of sets such that:
Chapter 21 Data Structures for Disjoint Sets Lee, Hsiu-Hui Ack: This presentation is based on the lecture slides from Prof. Tsai, Shi-Chun as well as various.
WEEK 5 The Disjoint Set Class Ch CE222 Dr. Senem Kumova Metin
CSE 373, Copyright S. Tanimoto, 2001 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
How many edges does a tree of n nodes have? 1.2n-4 2.n(n-1)/2 3.n-1 4.n/2 5.Between n and 2n.
CSE 373: Data Structures and Algorithms
Data Structures for Disjoint Sets
Disjoint Sets Data Structure
CSE 373, Copyright S. Tanimoto, 2001 Up-trees -
An application of trees: Union-find problem
CMSC 341 Disjoint Sets Based on slides from previous iterations of this course.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Linda Shapiro Spring 2016.
Disjoint Set Neil Tang 02/23/2010
Disjoint Set Neil Tang 02/26/2008
CSCE 411 Design and Analysis of Algorithms
CSE 332: Data Abstractions Union/Find II
Union-Find Partition Structures
Union-Find Partition Structures
CMSC 341 Disjoint Sets.
CSE 326 Union Find Richard Anderson Text Chapter CSE 326.
Disjoint Sets Given a set {1, 2, …, n} of n elements.
Disjoint Sets DS.S.1 Chapter 8 Overview Dynamic Equivalence Classes
Disjoint Sets Given a set {1, 2, …, n} of n elements.
CMSC 341 Disjoint Sets.
Running Time Analysis Union is clearly a constant time operation.
Kruskal’s algorithm for MST and Special Data Structures: Disjoint Sets
Union-Find Problem Given a set {1, 2, …, n} of n elements.
Disjoint Sets Data Structure (Chap. 21)
An application of trees: Union-find problem
Disjoint Sets Textbook Chapter 8
CSE 373: Data Structures and Algorithms
Presentation transcript:

Lecture 16: Union and Find for Disjoint Data Sets Shang-Hua Teng

Union-Find Problem Given a set {1, 2, …, n} of n elements Initially each element is in a different set –{1}, {2}, …, {n} An intermixed sequence of union and find operations is performed A union operation combines two sets into one –Each of the n elements is in exactly one set at any time –Can be proven by induction A find operation identifies the set that contains a particular element Application – Equivalence Class

Disjoint Sets Suppose we have N distinct items. We want to partition the items into a collection of sets such that: –each item is in a set –no item is in more than one set Examples –BU students according to majors, or –BU students according to GPA, or –Graph vertices according to connected components The resulting sets are said to be disjoint sets.

Disjoint sets Set : a collection of (distinguishable) elements Two sets are disjoint if they have no common elements Disjoint-set data structure: –maintains a collection of disjoint sets –each set has a representative element –supported operations: MakeSet(x) Find(x) Union(x,y)

Disjoint sets MakeSet(x) –Given object x, create a new set whose only element (and representative) is pointed to by x Find(x) –Given object x, return (a pointer to) the representative of the set containing x – Assumption: there is a pointer to each x so we never have to look for an element in the structure

Disjoint sets Union(x,y) –Given two elements x, y, merge the disjoint sets containing them. –The original sets are destroyed. –The new set has a new representative (usually one of the representatives of the original sets) –At most n-1 Unions can be performed where n is the number of elements (why?)

Union-Find Algorithms Disjoint set algorithms are sometimes called union-find algorithms.

Disjoint Set Example Find the connected components of the undirected graph G=(V,E) (maximal subgraphs that are connected). for (each vertex v in V) Makeset(v): put v in its own set for (each edge (u,v) in E) if (find(u) ~= find(v)) union(u,v) Now we can find if two vertices x and y are in the same connected component by testing find(x) == find(y)

Disjoint sets -- implementation In the discussion that follows: –n is the total number of elements (in all sets). –m is the total number of operations performed

Disjoint Sets:Implementation #1 Using linked lists: –The first element of the list is the representative –Each node contains: an element a pointer to the next node in the list a pointer to the representative

Disjoint Sets: Implementation#1 Using linked lists: –MakeSet(x) Create a list with only one node, for x Time O(1) –Find(x) Return the pointer to the representative (assuming you are pointing at the x node) Time O(1)

Disjoint Sets:Implementation#1 Using linked lists: –Union(x,y) 1. Append y’s list to x’s list. 2. Pick x as a representative 3. Update y’s “representative” pointers A sequence of m operations may take O(m 2 ) time Improvement: let each representative keep track of the length of its list and always append the shorter list to the longer one. –Now, a sequence of m operations takes O(m+nlgn) time (why?)

Disjoint Sets:Implementation#1 An Improvement Let each representative keep track of the length of its list and always append the shorter list to the longer one. Theorem: Any sequence of m operations takes O(m+n log n) time.

A Tight Bound O(n + u log u + f), where u and f are, respectively, the number of union and find operations in the sequence of requests Can we do better?

Up-Trees A simple data structure for implementing disjoint sets is the up-tree. A H W H, A and W belong to the same set. H is the representative X, B, R and F are in the same set. X is the representative B X R F

A Set As A Tree S = {2, 4, 5, 9, 11, 13, 30} Some possible tree representations:

Operations in Up-Trees Find is easy. Just follow pointer to representative element. The representative has no parent. find(x) 1. if (parent(x) exists)// not the root return(find(parent(x)); 2.else return (x); Worst case, height of the tree

Steps For find(i) Start at the node that represents element i and climb up the tree until the root is reached Return the element in the root To climb the tree, each node must have a parent pointer

Result Of A Find Operation find(i) is to identify the set that contains element i In most applications of the union-find problem, the user does not provide set identifiers The requirement is that find(i) and find(j) return the same value iff elements i and j are in the same set find(i) will return the element that is in the tree root

Possible Node Structure Use nodes that have two fields: element and parent  Use an array table[] such that table[i] is a pointer to the node whose element is i  To do a find(i) operation, start at the node given by table[i] and follow parent fields until a node whose parent field is null is reached  Return element in this root node

Example table[] (Only some table entries are shown.)

Better Representation Use an integer array parent[] such that parent[i] is the element that is the parent of element i parent[]

Union Union is more complicated. Make one representative element point to the other, but which way? Does it matter? In the example, some elements are now deeper away from the root

Union(H, X) A H W B X R F A H W B X R F X points to H B, R and F are now deeper H points to X A and W are now deeper

Union public union(rootA, rootB) {parent[rootB] = rootA;} Time Complexity: O(1)

A worse case for Union Union can be done in O(1), but may cause find to become O(n) A BCDE Consider the result of the following sequence of operations: Union (A, B) Union (C, A) Union (D, C) Union (E, D)

Two Heuristics There are two heuristics that improve the performance of union-find. –Union by weight or height –Path compression on find

Height Rule Make tree with smaller height a subtree of the other tree Break ties arbitrarily union(7,13)

Weight Rule Make tree with fewer number of elements a subtree of the other tree Break ties arbitrarily union(7,13)

Implementation Root of each tree must record either its height or the number of elements in the tree. When a union is done using the height rule, the height increases only when two trees of equal height are united. When the weight rule is used, the weight of the new tree is the sum of the weights of the trees that are united.

Height Of A Tree If we start with single element trees and perform unions using either the height or the weight rule. The height of a tree with p elements is at most floor (log 2 p) + 1. Proof is by induction on p.

Union by Weight Heuristic Always attach smaller tree to larger. union(x,y) rep_x = find(x); rep_y = find(y); if (weight[rep_x] < weight[rep_y]) A[rep_x] = rep_y; weight[rep_y] += weight[rep_x]; else A[rep_y] = rep_x; weight[rep_x] += weight[rep_y];

Performance w/ Union by Weight If unions are done by weight, the depth of any element is never greater than log n + 1. Inductive Proof: –Initially, ever element is at depth zero. –When its depth increases as a result of a union operation (it’s in the smaller tree), it is placed in a tree that becomes at least twice as large as before (union of two equal size trees). –How often can each union be done? -- lg n times, because after at most lg n unions, the tree will contain all n elements. Therefore, find becomes O(log n) when union by weight is used -- even without path compression.

Path Compression Each time we do a find on an element E, we make all elements on path from root to E be immediate children of root by making each element’s parent be the representative. find(x) if (A[x]<0) return(x); A[x] = find(A[x]); return (A[x]); When path compression is done, a sequence of m operations takes O(m log n) time. Amortized time is O(log n) per operation.

Path Compression find(1) Do additional work to make future finds easier a bc d e f g a, b, c, d, e, f, and g are subtrees

Path Compression Make all nodes on find path point to tree root. find(1) a bc d e f g a, b, c, d, e, f, and g are subtrees Makes two passes up the tree

Ackermann’s Functions The Ackermann’s function is the simplest example of a well-defined total function which is computable but not primitive recursive. "A function to end all functions" -- Gunter Dötzel. –1. If m = 0 then A(m, n) = n + 1 –2. If n = 0 then A(m, n) = A(m-1, 1) –3. Otherwise, A(m, n) = A(m-1, A(m, n-1)) The function f(n) = A(n, n) grows much faster than polynomials or exponentials or any function that you can imagine

Ackermann’s Function Ackermann’s function.  A(m,n) = 2 n, m = 1 and n >= 1  A(m,n) = A(m-1,2), m>= 2 and n = 1  A(m,n) = A(m-1,A(m,n-1)), m,n >= 2 Ackermann’s function grows very rapidly as m and n increase  A(2,4) = 2 65,536

Time Complexity Inverse of Ackermann’s function.   (n) = min{k>=1 | A(k,1) > n},  The inverse function grows very slowly   (n) < 5 until n = 2 A(4,1) + 1  A(4,1) >> For all practical purposes,  (n) < 5

Time Complexity Theorem 12.2 [Tarjan and Van Leeuwen] Let T(n,m) be the maximum time required to process any intermixed sequence of n finds and unions. T(n,m) = O(m  (n)) when we start with singleton sets and use either the weight or height rule for unions and any one of the path compression methods for a find.