CSE 332: Data Structures Disjoint Set Union/Find

Slides:

Advertisements

Similar presentations

CSE 326: Data Structures Part 7: The Dynamic (Equivalence) Duo: Weighted Union & Path Compression Henry Kautz Autumn Quarter 2002 Whack!! ZING POW BAM!

Advertisements

1 Disjoint Sets Set = a collection of (distinguishable) elements Two sets are disjoint if they have no common elements Disjoint-set data structure: –maintains.

Union-Find: A Data Structure for Disjoint Set Operations

CSE 326: Data Structures Disjoint Union/Find Ben Lerner Summer 2007.

Disjoint Union / Find CSE 373 Data Structures Lecture 17.

CSE 326: Data Structures Disjoint Union/Find. Equivalence Relations Relation R : For every pair of elements (a, b) in a set S, a R b is either true or.

CSE 421 Algorithms Richard Anderson Dijkstra’s algorithm.

Heaps Heaps are used to efficiently implement two operations:

CSE 326: Data Structures Hashing Ben Lerner Summer 2007.

Lecture 16: Union and Find for Disjoint Data Sets Shang-Hua Teng.

CSE 373, Copyright S. Tanimoto, 2002 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.

Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.

Trees Featuring Minimum Spanning Trees HKOI Training 2005 Advanced Group Presented by Liu Chi Man (cx)

Theory of Computing Lecture 10 MAS 714 Hartmut Klauck.

1 22c:31 Algorithms Union-Find for Disjoint Sets.

Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.

CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Aaron Bauer Winter 2014.

CMSC 341 Disjoint Sets. 8/3/2007 UMBC CMSC 341 DisjointSets 2 Disjoint Set Definition Suppose we have an application involving N distinct items. We will.

CMSC 341 Disjoint Sets Textbook Chapter 8. Equivalence Relations A relation R is defined on a set S if for every pair of elements (a, b) with a,b  S,

Union-find Algorithm Presented by Michael Cassarino.

Union Find ADT Data type for disjoint sets: makeSet(x): Given an element x create a singleton set that contains only this element. Return a locator/handle.

CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Nicki Dell Spring 2014.

CSE373: Data Structures & Algorithms Lecture 10: Implementing Union-Find Dan Grossman Fall 2013.

Fundamental Data Structures and Algorithms Peter Lee April 24, 2003 Union-Find.

ICS 353: Design and Analysis of Algorithms Heaps and the Disjoint Sets Data Structures King Fahd University of Petroleum & Minerals Information & Computer.

1 Today’s Material The dynamic equivalence problem –a.k.a. Disjoint Sets/Union-Find ADT –Covered in Chapter 8 of the textbook.

Minimum Spanning Trees Featuring Disjoint Sets HKOI Training 2006 Liu Chi Man (cx) 25 Mar 2006.

CS 146: Data Structures and Algorithms July 16 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak

MA/CSSE 473 Days Answers to student questions Prim's Algorithm details and data structures Kruskal details.

0 Union-Find data structure. 1 Disjoint set ADT (also Dynamic Equivalence) The universe consists of n elements, named 1, 2, …, n n The ADT is a collection.

CMSC 341 Disjoint Sets. 2 Disjoint Set Definition Suppose we have N distinct items. We want to partition the items into a collection of sets such that:

CSEP 521 Applied Algorithms Richard Anderson Winter 2013 Lecture 3.

CSE 373, Copyright S. Tanimoto, 2001 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.

CSE 326: Data Structures: Set ADT

CSE 373: Data Structures and Algorithms

Disjoint Sets Data Structure

CSE 373, Copyright S. Tanimoto, 2001 Up-trees -

Binomial heaps, Fibonacci heaps, and applications

Chapter 8 Disjoint Sets and Dynamic Equivalence

Disjoint Sets Chapter 8.

CSE373: Data Structures & Algorithms Lecture 10: Disjoint Sets and the Union-Find ADT Linda Shapiro Spring 2016.

Course Outline Introduction and Algorithm Analysis (Ch. 2)

CMSC 341 Disjoint Sets Based on slides from previous iterations of this course.

CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Linda Shapiro Spring 2016.

CSE373: Data Structures & Algorithms Lecture 10: Disjoint Sets and the Union-Find ADT Linda Shapiro Winter 2015.

Disjoint Set Neil Tang 02/23/2010

Autumn 2016 Lecture 9 Dijkstra’s algorithm

Graphs Chapter 11 Objectives Upon completion you will be able to:

CS200: Algorithm Analysis

Autumn 2015 Lecture 9 Dijkstra’s algorithm

ICS 353: Design and Analysis of Algorithms

CSE 373 Data Structures and Algorithms

Richard Anderson Lecture 9 Dijkstra’s algorithm

Richard Anderson Lecture 9 Dijkstra’s algorithm

CSE 332: Data Abstractions Union/Find II

ICS 353: Design and Analysis of Algorithms

CSE373: Data Structures & Algorithms Implementing Union-Find

CSE 326 Union Find Richard Anderson Text Chapter CSE 326.

Disjoint Sets DS.S.1 Chapter 8 Overview Dynamic Equivalence Classes

Running Time Analysis Union is clearly a constant time operation.

CSC 380: Design and Analysis of Algorithms

Winter 2019 Lecture 9 Dijkstra’s algorithm

Binomial heaps, Fibonacci heaps, and applications

CSE 332: Minimum Spanning Trees

Disjoint Sets Textbook Chapter 8

CSE 373: Data Structures and Algorithms

Disjoint Set Operations: “UNION-FIND” Method

Autumn 2019 Lecture 9 Dijkstra’s algorithm

Presentation transcript:

CSE 332: Data Structures Disjoint Set Union/Find Richard Anderson Spring 2016

Announcements Reading for this lecture: Chapter 8.

Dijkstra’s Algorithm Review from last week S = {}; d[s] = 0; d[v] = infinity for v != s While S != V Choose v in V-S with minimum d[v] Add v to S For each w in the neighborhood of v d[w] = min(d[w], d[v] + c(v, w)) 4 3 y 1 1 u 1 1 4 s x 2 2 2 2 Assume all edges have non-negative cost v 2 3 5 z

Simulate Dijkstra’s algorithm (starting from s) on the graph Vertex Added Round s a b c d 1 1 2 3 4 5 c a 1 3 2 1 s 4 4 6 1 b d 3

http://www.cs.utexas.edu/users/EWD/ Edsger Wybe Dijkstra was one of the most influential members of computing science's founding generation. Among the domains in which his scientific contributions are fundamental are algorithm design programming languages program design operating systems distributed processing formal specification and verification design of mathematical arguments

Why do we worry about negative cost edges?? x z s u v

Graph Algorithms / Data Structures Dijkstra’s Algorithm for Shortest Paths Heaps, O(m log n) runtime Kruskal’s Algorithm for Minimum Spanning Tree Union-Find data structure

Making Connections You have a set of nodes (numbered 1-9) on a network. You are given a sequence of pairwise connections between them: 3-5 4-2 1-6 5-7 4-8 3-7 Q: Are nodes 2 and 4 (indirectly) connected? Q: How about nodes 3 and 8? Q: Are any of the paired connections redundant due to indirect connections? Q: How many sub-networks do you have?

Making Connections Answering these questions is much easier if we create disjoint sets of nodes that are connected: Start: {1} {2} {3} {4} {5} {6} {7} {8} {9} 3-5 4-2 1-6 5-7 4-8 3-7 Q: Are nodes 2 and 4 (indirectly) connected? Q: How about nodes 3 and 8? Q: Are any of the paired connections redundant due to indirect connections? Q: How many sub-networks do you have?

Applications of Disjoint Sets Maintaining disjoint sets in this manner arises in a number of areas, including: Networks Transistor interconnects Compilers Image segmentation Building mazes (this lecture) Graph problems Minimum Spanning Trees (upcoming topic in this class)

Disjoint Set ADT Data: set of pairwise disjoint sets. Required operations Union – merge two sets to create their union Find – determine which set an item appears in A common operation sequence: Connect two elements if not already connected: if (Find(x) != Find(y)) then Union(x,y) 11

Disjoint Sets and Naming Maintain a set of pairwise disjoint sets. {3,5,7} , {4,2,8}, {9}, {1,6} Each set has a unique name: one of its members (for convenience)

Union Union(x,y) – take the union of two sets named x and y {3,5,7} , {4,2,8}, {9}, {1,6} Union(5,1) {3,5,7,1,6}, {4,2,8}, {9},

Find Find(x) – return the name of the set containing x. {3,5,7,1,6}, {4,2,8}, {9}, Find(1) = 5 Find(4) = 8

Example S {1,2,7,8,9,13,19} {3} {4} {5} {6} {10} {11,17} {12} {14,20,26,27} {15,16,21} . {22,23,24,29,39,32 33,34,35,36} S {1,2,7,8,9,13,19,14,20 26,27} {3} {4} {5} {6} {10} {11,17} {12} {15,16,21} . {22,23,24,29,39,32 33,34,35,36} Find(8) = 7 Find(14) = 20 Union(7,20)

Nifty Application: Building Mazes Idea: Build a random maze by erasing walls.

Building Mazes Pick Start and End Start End

Building Mazes Repeatedly pick random walls to delete. Start End

Desired Properties None of the boundary is deleted (except at “start” and “end”). Every cell is reachable from every other cell. There are no cycles – no cell can reach itself by a path unless it retraces some part of the path.

A Cycle Start End

A Good Solution Start End

A Hidden Tree Start End

Number the Cells We start with disjoint sets S ={ {1}, {2}, {3}, {4},… {36} }. We have all possible walls between neighbors W ={ (1,2), (1,7), (2,8), (2,3), … } 60 walls total. Start 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 End Idea: Union-find operations will be done on cells.

Maze Building with Disjoint Union/Find Algorithm sketch: Choose wall at random. → Boundary walls are not in wall list, so left alone Erase wall if the neighbors are in disjoint sets. → Avoids cycles Take union of those sets. Go to 1, iterate until there is only one set. → Every cell reachable from every other cell.

Pseudocode S = set of sets of connected cells W = set of walls Initialize to {{1}, {2}, …, {n}} W = set of walls Initialize to set of all walls {{1,2},{1,7}, …} Maze = set of walls in maze (initially empty) While there is more than one set in S Pick a random non-boundary wall (x,y) and remove from W u = Find(x); v = Find(y); if u  v then Union(u,v) else Add wall (x,y) to Maze Add remaining members of W to Maze

Example Step Pick (8,14) S {1,2,7,8,9,13,19} {3} {4} {5} {6} {10} {11,17} {12} {14,20,26,27} {15,16,21} . {22,23,24,29,30,32 33,34,35,36} Start 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 End 31 32 33 34 35 36

Example S {1,2,7,8,9,13,19} {3} {4} {5} {6} {10} {11,17} {12} {14,20,26,27} {15,16,21} . {22,23,24,29,39,32 33,34,35,36} S {1,2,7,8,9,13,19,14,20 26,27} {3} {4} {5} {6} {10} {11,17} {12} {15,16,21} . {22,23,24,29,39,32 33,34,35,36} Find(8) = 7 Find(14) = 20 Union(7,20)

Example Pick (19,20) S {1,2,7,8,9,13,19 14,20,26,27} {3} {4} {5} {6} {10} {11,17} {12} {15,16,21} . {22,23,24,29,39,32 33,34,35,36} Start 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 End

Example at the End S {1,2,3,4,5,6,7,… 36} Start 1 2 3 4 5 6 Remaining walls in W Previously added to Maze 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 End

Data structure for disjoint sets? Represent: {3,5,7} , {4,2,8}, {9}, {1,6} Support: find(x), union(x,y)

Union/Find Trade-off Known result: Find and Union cannot both be done in worst-case O(1) time with any data structure. We will instead aim for good amortized complexity. For m operations on n elements: Target complexity: O(m) i.e. O(1) amortized

Tree-based Approach Each set is a tree Root of each tree is the set name. Allow large fanout (why?)

Up-Tree for DS Union/Find Observation: we will only traverse these trees upward from any given node to find the root. Idea: reverse the pointers (make them point up from child to parent). The result is an up-tree. Initial state 1 2 3 4 5 6 7 Intermediate state 1 3 7 2 5 4 Roots are the names of each set. 6

Find Operation Find(x) follow x to the root and return the root. 1 3 7 2 5 4 6

Union Operation Union(i, j) - assuming i and j roots, point i to j. 1 3 7 2 5 4 6

Simple Implementation Assume that the keys are numbered 1…n or 0…n-1 Simple Implementation Array of indices 1 2 3 4 5 6 7 up[x] = -1 means x is a root. up 1 3 7 4 2 5 6

Implementation runtime for Union: runtime for Find: void Union(int x, int y) { assert(up[x]<0 && up[y]<0); up[x] = y; } int Find(int x) { while(up[x] >= 0) { x = up[x]; } return x; Θ(depth) = Θ(n) Θ(1) runtime for Union: runtime for Find: Find runs in time linear in the maximum depth of an up-tree. What is the maximum depth? Amortized complexity is no better. 37

A Bad Case … … … : Find(1) n steps!! 1 2 3 n Union(1,2) 2 3 n Union(n-1,n) n 1 3 Find(1) n steps!! 2 1

Can You do Better?

Two Big Improvements Union-by-size Path compression Can we do better? Yes! Union-by-size Improve Union so that Find only takes worst case time of Θ(log n). Path compression Improve Find so that, with Union-by-size, Find takes amortized time of almost Θ(1).

Union-by-Size Union-by-size Always point the smaller tree to the root of the larger tree S-Union(7,1) 2 1 4 1 3 7 2 5 4 6

Example Again … … … … S-Union(2,3) : S-Union(n-1,n) Find(1) constant time

Analysis of Union-by-Size Theorem: With union-by-size an up-tree of height h has size at least 2h. Proof by induction Base case: h = 0. The up-tree has one node, 20 = 1 Inductive hypothesis: Assume true for h-1 Observation: tree gets taller only as a result of a union. T = S-Union(T1,T2) ≤h-1 h-1 T1 T2 S(T) = S(T1) + S(T2) > 2h-1 + 2h-1 = 2h

Analysis of Union-by-Size What is worst case complexity of Find(x) in an up-tree forest of n nodes? (Amortized complexity is no better.) All nodes in one tree of height h. n > 2h log2 n > h Aside: Union-by-height is equally good. Updating height is just: height = max(old_height, added_subtree_height+1) When doing path compression, don’t update the height and call it “rank” instead. But follow the height update rule: rank = max(old_rank, added_subtree_rank+1). Sometimes union-by-size is called weighted union. (Hmm, is union-by-height/rank also called weighted union?) (Hmm, do the ackermann function results hold for union-by-size? Weiss suggests it does.)

Worst Case for Union-by-Size n/2 Unions-by-size n/4 Unions-by-size

Example of Worst Cast (cont’) After n -1 = n/2 + n/4 + …+ 1 Unions-by-size log2n Find If there are n = 2k nodes then the longest path from leaf to root has length k.

Array Implementation 2 1 4 1 3 7 2 5 4 6 Can store separate size array: 1 2 3 4 5 6 7 up -1 1 -1 7 7 5 -1 size 2 1 4

Elegant Array Implementation 2 1 4 1 3 7 2 5 4 6 Better, store sizes in the up array: 1 2 3 4 5 6 7 up -2 1 -1 7 7 5 -4 Negative up-values correspond to sizes of roots.

Code for Union-by-Size S-Union(i,j){ // Collect sizes si = -up[i]; sj = -up[j]; // verify i and j are roots assert(si >=0 && sj >=0) // point smaller sized tree to // root of larger, update size if (si < sj) { up[i] = j; up[j] = -(si + sj); else { up[j] = i; up[i] = -(si + sj); }

Path Compression To improve the amortized complexity, we’ll borrow an idea from splay trees: When going up the tree, improve nodes on the path! On a Find operation point all the nodes on the search path directly to the root. This is called “path compression.” 1 7 1 7 2 5 4 PC-Find(3) 2 3 6 5 4 6 8 9 10 8 9 3 10

Self-Adjustment Works PC-Find(x) x

Draw the result of Find(5): show array before, new tree, new array: 3 7 1 6 8 2 4 Note that everything along the path got pointed to 3. Note also that this is definitely not a binary tree! 9 5

Code for Path Compression Find PC-Find(i) { //find root j = i; while (up[j] >= 0) { j = up[j]; root = j; //compress path if (i != root) { parent = up[i]; while (parent != root) { up[i] = root; i = parent; parent = up[parent]; } return(root)

Complexity of Union-by-Size + Path Compression Worst case time complexity for… …a single Union-by-size is: …a single PC-Find is: Time complexity for m  n operations on n elements has been shown to be O(m log* n). [See Weiss for proof.] Amortized complexity is then O(log* n) What is log* ? Θ(1) Θ(log n)

log* n log* n = number of times you need to apply log to bring value down to at most 1 log* 2 = 1 log* 4 = log* 22 = 2 log* 16 = log* 222 = 3 (log log log 16 = 1) log* 65536 = log* 2222 = 4 (log log log log 65536 = 1) log* 265536 = …………… ≈ log* (2 x 1019,728) = 5 log * n ≤ 5 for all reasonable n.

The Tight Bound In fact, Tarjan showed the time complexity for m  n operations on n elements is: Q(m a(m, n)) Amortized complexity is then Q(a(m, n)) . What is a(m, n)? Inverse of Ackermann’s function. For reasonable values of m, n, grows even slower than log * n. So, it’s even “more constant.” Proof is beyond scope of this class. A simple algorithm can lead to incredibly hardcore analysis!