CSE 326: Data Structures Lecture #19 Disjoint Sets Dynamic Equivalence Weighted Union & Path Compression David Kaplan davek@cs Today we’re going to get.

Slides:



Advertisements
Similar presentations
Lecture 10 Disjoint Set ADT.
Advertisements

CSE 326: Data Structures Part 7: The Dynamic (Equivalence) Duo: Weighted Union & Path Compression Henry Kautz Autumn Quarter 2002 Whack!! ZING POW BAM!
1 Union-find. 2 Maintain a collection of disjoint sets under the following two operations S 3 = Union(S 1,S 2 ) Find(x) : returns the set containing x.
1 Disjoint Sets Set = a collection of (distinguishable) elements Two sets are disjoint if they have no common elements Disjoint-set data structure: –maintains.
1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 20 Prof. Erik Demaine.
EECS 311: Chapter 8 Notes Chris Riesbeck EECS Northwestern.
Union-Find: A Data Structure for Disjoint Set Operations
CSE 326: Data Structures Disjoint Union/Find Ben Lerner Summer 2007.
Disjoint Union / Find CSE 373 Data Structures Lecture 17.
CSE 326: Data Structures Disjoint Union/Find. Equivalence Relations Relation R : For every pair of elements (a, b) in a set S, a R b is either true or.
Lecture 9 Disjoint Set ADT. Preliminary Definitions A set is a collection of objects. Set A is a subset of set B if all elements of A are in B. Subsets.
Lecture 16: Union and Find for Disjoint Data Sets Shang-Hua Teng.
1 Chapter 8 The Disjoint Set ADT Concerns with equivalence problems Find and Union.
CSE 373, Copyright S. Tanimoto, 2002 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
© 2004 Goodrich, Tamassia Union-Find1 Union-Find Partition Structures.
MA/CSSE 473 Day 36 Kruskal proof recap Prim Data Structures and detailed algorithm.
Spring 2015 Lecture 11: Minimum Spanning Trees
1 22c:31 Algorithms Union-Find for Disjoint Sets.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Aaron Bauer Winter 2014.
CSE373: Data Structures & Algorithms Lecture 10: Disjoint Sets and the Union-Find ADT Lauren Milne Spring 2015.
CMSC 341 Disjoint Sets. 8/3/2007 UMBC CMSC 341 DisjointSets 2 Disjoint Set Definition Suppose we have an application involving N distinct items. We will.
CMSC 341 Disjoint Sets Textbook Chapter 8. Equivalence Relations A relation R is defined on a set S if for every pair of elements (a, b) with a,b  S,
Spanning Trees CSIT 402 Data Structures II 1. 2 Two Algorithms Prim: (build tree incrementally) – Pick lower cost edge connected to known (incomplete)
Union-find Algorithm Presented by Michael Cassarino.
Union Find ADT Data type for disjoint sets: makeSet(x): Given an element x create a singleton set that contains only this element. Return a locator/handle.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Nicki Dell Spring 2014.
CSE373: Data Structures & Algorithms Lecture 10: Implementing Union-Find Dan Grossman Fall 2013.
Fundamental Data Structures and Algorithms Peter Lee April 24, 2003 Union-Find.
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Disjoint Sets.
1 The Disjoint Set ADT CS146 Chapter 8 Yan Qing Lei.
1 Today’s Material The dynamic equivalence problem –a.k.a. Disjoint Sets/Union-Find ADT –Covered in Chapter 8 of the textbook.
Union-Find A data structure for maintaining a collection of disjoint sets Course: Data Structures Lecturers: Haim Kaplan and Uri Zwick January 2014.
CSCI 3160 Design and Analysis of Algorithms Tutorial 3 Chengyu Lin.
CS 146: Data Structures and Algorithms July 16 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak
MA/CSSE 473 Days Answers to student questions Prim's Algorithm details and data structures Kruskal details.
0 Union-Find data structure. 1 Disjoint set ADT (also Dynamic Equivalence) The universe consists of n elements, named 1, 2, …, n n The ADT is a collection.
CHAPTER 8 THE DISJOINT SET ADT §1 Equivalence Relations 【 Definition 】 A relation R is defined on a set S if for every pair of elements (a, b), a, b 
CMSC 341 Disjoint Sets. 2 Disjoint Set Definition Suppose we have N distinct items. We want to partition the items into a collection of sets such that:
Minimum Spanning Trees CSE 373 Data Structures Lecture 21.
CSE373: Data Structures & Algorithms Lecture 9: Disjoint Sets and the Union-Find ADT Lauren Milne Summer 2015.
WEEK 5 The Disjoint Set Class Ch CE222 Dr. Senem Kumova Metin
CSE 589 Applied Algorithms Spring 1999 Prim’s Algorithm for MST Load Balance Spanning Tree Hamiltonian Path.
MA/CSSE 473 Day 37 Student Questions Kruskal Data Structures and detailed algorithm Disjoint Set ADT 6,8:15.
CSE 373, Copyright S. Tanimoto, 2001 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
Data Structures for Disjoint Sets Manolis Koubarakis Data Structures and Programming Techniques 1.
CSE 326: Data Structures: Set ADT
CSE 373: Data Structures and Algorithms
CSE 373, Copyright S. Tanimoto, 2001 Up-trees -
Chapter 8 Disjoint Sets and Dynamic Equivalence
Disjoint Sets Chapter 8.
CSE373: Data Structures & Algorithms Lecture 10: Disjoint Sets and the Union-Find ADT Linda Shapiro Spring 2016.
Course Outline Introduction and Algorithm Analysis (Ch. 2)
CMSC 341 Disjoint Sets Based on slides from previous iterations of this course.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Linda Shapiro Spring 2016.
Disjoint Set Neil Tang 02/23/2010
Disjoint Set Neil Tang 02/26/2008
CSE 332: Data Structures Disjoint Set Union/Find
CSE 373 Data Structures and Algorithms
CSE 332: Data Abstractions Union/Find II
CSE373: Data Structures & Algorithms Implementing Union-Find
Union-Find.
CSE 326 Union Find Richard Anderson Text Chapter CSE 326.
Disjoint Sets DS.S.1 Chapter 8 Overview Dynamic Equivalence Classes
Running Time Analysis Union is clearly a constant time operation.
Minimum Spanning Trees
Disjoint Sets Textbook Chapter 8
CSE 373: Data Structures and Algorithms
Disjoint Set Operations: “UNION-FIND” Method
Presentation transcript:

CSE 326: Data Structures Lecture #19 Disjoint Sets Dynamic Equivalence Weighted Union & Path Compression David Kaplan davek@cs Today we’re going to get to know the dynamic duo of union/find. But first…

Today’s Outline Making a “good” maze Disjoint Set Union/Find ADT Up-trees Maze revisited Weighted Union Path Compression An amazing complexity analysis We’ll finish up extendible hashing and summarize the advantages/disadvantages of hash tables. Then, let’s look at a motivating problem. Next, we’ll hit the ADT. Then the data structure and the dynamic duo.

The Maze Construction Problem Represent maze environment as graph {V,E} collection of rooms: V connections between rooms (initially all closed): E Construct a maze: collection of rooms: V = V designated rooms in, iV, and out, oV collection of connections to knock down: E  E such that one unique path connects every two rooms Here’s a formalism which tries to approach that.

The Middle of the Maze So far, some walls have been knocked down while others remain. Now, we consider the wall between A and B. Should we knock it down? no, if A and B are otherwise connected yes, if A and B are not otherwise connected A B Let’s say we’re in the middle of knocking out walls to make a good maze. How do we decide whether to knock down the current wall?

Maze Construction Algorithm While edges remain in E Remove a random edge e = (u, v) from E If u and v have not yet been connected add e to E mark u and v as connected

Equivalence Relations An equivalence relation R has three properties: reflexive: for any x, xRx is true symmetric: for any x and y, xRy implies yRx transitive: for any x, y, and z, xRy and yRz implies xRz Connection between rooms is an equivalence relation (call it C) For any rooms a, b, c a C a (A room connects to itself!) If a C b, then b C a If a C b and b C c, then a C c Examples of other equivalence relations?

Disjoint Set Union/Find ADT Union/Find operations create destroy union find Disjoint set equivalence property: every element of a DS U/F structure belongs to exactly one set Dynamic equivalence property: the set of an element can change after execution of a union {1,4,8} {7} {6} {5,9,10} {2,3} find(4) 8 union(2,6) {2,3,6}

Disjoint Set Union/Find More Formally Given a set U = {a1, a2, … , an} Maintain a partition of U, a set of subsets of U {S1, S2, … , Sk} such that: each pair of subsets Si and Sj are disjoint: together, the subsets cover U: The Si are mutually exclusive and collectively exhaustive Union(a, b) creates a new subset which is the union of a’s subset and b’s subset Find(a) returns a unique name for a’s subset Note: we’ll assume that a union is actually passed the names of the subsets. So, there are no hidden finds inside a union. NOTE: set names are arbitrary! We only care that: Find(a) == Find(b)  a and b are in the same subset NOTE: outside agent must decide when/what to union; ADT is just the bookkeeper.

Example a d b e c f g h i Construct the maze on the right 3 2 4 11 10 1 7 9 6 8 12 5 Construct the maze on the right Initial state (set names underlined): {a}{b}{c}{d}{e}{f}{g}{h}{i} Maze constructor (outside agent) traverses edges in numeric order and decides whether to union

Example, First Step {a}{b}{c}{d}{e}{f}{g}{h}{i} find(b)  b find(e)  e find(b)  find(e) so: add 1 to E union(b, e) {a}{b,e}{c}{d}{f}{g}{h}{i} a b c 3 2 4 11 10 1 7 9 6 8 12 5 d e f g h i Order of edges in blue

Up-Tree Intuition Finding the representative member of a set is somewhat like the opposite of finding whether a given item exists in a set. So, instead of using trees with pointers from each node to its children; let’s use trees with a pointer from each node to its parent.

Up-Tree Union-Find Data Structure Each subset is an up-tree with its root as its representative member All members of a given set are nodes in that set’s up-tree Hash table maps input data to the node associated with that data a c g h d b f i e Up-trees are not necessarily binary!

Find find(f) find(e) a b c a c g h d e f d b f i g h i e 10 c a c g h d e 7 f 11 9 8 d b f i O(depth) g 12 h i e Just traverse to the root! runtime:

Union union(a,c) a b c a c g h d e f d b f i g h i e 10 c a c g h d e f 11 9 8 d b f i O(1) g 12 h i e Just hang one root from the other! runtime:

The Whole Example (1/11) a b c d e f union(b,e) g h i a b c d e f g h 3 b 10 c The Whole Example (1/11) 2 1 6 d 4 e 7 f 11 9 8 union(b,e) g 12 h 5 i a b c d e f g h i a b c d f g h i e

The Whole Example (2/11) a b c d e f union(a,d) g h i a b c d f g h i 3 b 10 c The Whole Example (2/11) 2 6 d 4 e 7 f 11 9 8 union(a,d) g 12 h 5 i a b c d f g h i e a b c f g h i d e

The Whole Example (3/11) a b c d e f union(a,b) g h i a b c f g h i d 10 c The Whole Example (3/11) 6 d 4 e 7 f 11 9 8 union(a,b) g 12 h 5 i a b c f g h i d e a c f g h i b d e

The Whole Example (4/11) a b c d e f find(d) = find(e) No union! g h i 10 c The Whole Example (4/11) 6 d 4 e 7 f 11 9 8 find(d) = find(e) No union! g 12 h 5 i f g h a b c i d e While we’re finding e, could we do anything else?

The Whole Example (5/11) a b c d e f union(h,i) g h i a c f g h i a c 10 c The Whole Example (5/11) 6 d e 7 f 11 9 8 union(h,i) g 12 h 5 i a c f g h i a c f g h b d i b d e e

The Whole Example (6/11) a b c d e f union(c,f) g h i a c f g h a c g 10 c The Whole Example (6/11) 6 d e 7 f 11 9 8 union(c,f) g 12 h i a c f g h a c g h b d i b d f i e e

The Whole Example (7/11) a b c d e f find(e) find(f) union(a,c) g h i 10 c The Whole Example (7/11) d e 7 f find(e) find(f) union(a,c) 11 9 8 g 12 h i f g h a b c i d e Could we do a better job on this union?

The Whole Example (8/11) a b c d e f find(f) find(i) union(c,h) g h i 10 c The Whole Example (8/11) d e f find(f) find(i) union(c,h) 11 9 8 g 12 h i c g h c g a f i a f h b d b d i e e

The Whole Example (9/11) a b c d e f 10 c The Whole Example (9/11) d e f find(e) = find(h) and find(b) = find(c) So, no unions for either of these. 11 9 g 12 h i c g a f h b d i e

The Whole Example (10/11) a b c d e f find(d) find(g) union(c, g) g h 12 h i c g g c a f h a f h b d i b d i e e

The Whole Example (11/11) a b c d e f find(g) = find(h) So, no union. And, we’re done! g 12 h i g a b c c d e f a f h g h i b d i Ooh… scary! Such a hard maze! e

DS/DE data structure A forest of up-trees can easily be stored in an array. Also, if the node names are integers or characters, we can use a very simple, perfect hash. f g h a b c i d e Nifty storage trick! -1 1 2 7 0 (a) 1 (b) 2 (c) 3 (d) 4 (e) 5 (f) 6 (g) 7 (h) 8 (i) up-index:

Implementation runtime: O(depth) or … runtime: O(1) typedef ID int; ID find(Object x) { assert(hTable.contains(x)); ID xID = hTable[x]; while(up[xID] != -1) { xID = up[xID]; } return xID; ID union(ID x, ID y) { assert(up[x] == -1); assert(up[y] == -1); up[y] = x; } Find runs in time linear in the maximum depth of an up-tree. What is the maximum depth? runtime: O(depth) or … runtime: O(1)

Room for Improvement: Weighted Union Always makes the root of the larger tree the new root Often cuts down on height of the new up-tree f g h a b c i d e c g h a g h I use weighted because it makes the analysis a bit easier; your book suggests height-based union, and that’s probably fine. a f i b d c i b d e f Could we do a better job on this union? e Weighted union!

Weighted Union Code new runtime of union: new runtime of find: typedef ID int; ID union(ID x, ID y) { assert(up[x] == -1); assert(up[y] == -1); if (weight[x] > weight[y]) { up[y] = x; weight[x] += weight[y]; } else { up[x] = y; weight[y] += weight[x]; new runtime of union: Union still takes constant time. Find hasn’t changed, so it still takes O(depth) But what is the maximum depth now? new runtime of find:

Weighted Union Find Analysis Finds with weighted union are O(max up-tree height) But, an up-tree of height h with weighted union must have at least 2h nodes  2max height  n and max height  log n So, find takes O(log n) Base case: h = 0, tree has 20 = 1 node Induction hypothesis: assume true for h < h A merge can only increase tree height by one over the smaller tree. So, a tree of height h-1 was merged with a larger tree to form the new tree. Each tree then has  2h-1 nodes by the induction hypotheses for a total of at least 2h nodes. QED. When a node gets deeper, it only gets one deeper, and its up-tree must have been the smaller one in the merge. That means that its tree doubled in size. How many times can a tree double in size? Log n. So find takes O(log n).

Room for Improvement: Path Compression Points everything along the path of a find to the root Reduces the height of the entire access path to 1 f g h a b c i d e a c f g h i b d e While we’re finding e, could we do anything else? Path compression!

Path Compression Example find(e) c g c g a f h a f h Note that everything along the path got pointed to c. Note also that this is definitely not a binary tree! b e b d d i i e

Path Compression Code runtime: typedef ID int; ID find(Object x) { assert(hTable.contains(x)); ID xID = hTable[x]; ID hold = xID; while(up[xID] != -1) { xID = up[xID]; } while(up[hold] != -1) { temp = up[hold]; up[hold] = xID; hold = temp; return xID; Find clearly still takes O(depth), right? But how deep can a tree get now? runtime:

Complexity of Weighted Union + Path Compression Ackermann created a function A(x, y) which grows very fast! Inverse Ackermann function (x, y) grows very slooooowwwly … Single-variable inverse Ackermann function is called log* n How fast does log n grow? log n = 4 for n = 16 Let log(k) n = log (log (log … (log n))) Then, let log* n = minimum k such that log(k) n  1 How fast does log* n grow? log* n = 4 for n = 65536 log* n = 5 for n = 265536 (20,000 digit number!) How fast does (x, y) grow? Even sloooowwwer than log* n (x, y) = 4 for n far larger than the number of atoms in the universe (2300) k logs

Complex Complexity of Weighted Union + Path Compression Tarjan proved that m weighted union and find operations on a set of n elements have worst case complexity O(m(m, n)) This is essentially amortized constant time In some practical cases, weighted union or path compression or both are unnecessary because trees do not naturally get very deep. OK, now, your book shows a proof that find now takes amortized O(log*n) but actually, it’s even better than that: O(alpha(m,n)) time. Basically amortized 1. Can anyone think why the alpha doesn’t matter even in any good theoretical sense? Remember B-Trees: they showed us that our model was bogus for large data sets. By the time alpha gets near 4, our data sets are already larger than we can even imagine.

Disjoint Set Union/Find ADT Summary Simple ADT, simple data structure, simple code Complex complexity analysis, but extremely useful result: essentially, constant time! Lots of potential applications Object-property collections Index partitions: e.g. parts inspection To say nothing of maze construction In some applications, it may make sense to have meaningful (non-arbitrary) set names