Union-Find.

Slides:



Advertisements
Similar presentations
Lecture 10 Disjoint Set ADT.
Advertisements

CSE 326: Data Structures Part 7: The Dynamic (Equivalence) Duo: Weighted Union & Path Compression Henry Kautz Autumn Quarter 2002 Whack!! ZING POW BAM!
MA/CSSE 473 Day 38 Finish Disjoint Sets Dijkstra.
CSE 326: Data Structures Disjoint Union/Find Ben Lerner Summer 2007.
Disjoint Union / Find CSE 373 Data Structures Lecture 17.
CSE 326: Data Structures Disjoint Union/Find. Equivalence Relations Relation R : For every pair of elements (a, b) in a set S, a R b is either true or.
CSE Algorithms Minimum Spanning Trees Union-Find Algorithm
Lecture 9 Disjoint Set ADT. Preliminary Definitions A set is a collection of objects. Set A is a subset of set B if all elements of A are in B. Subsets.
CSE 373, Copyright S. Tanimoto, 2002 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
MA/CSSE 473 Day 36 Kruskal proof recap Prim Data Structures and detailed algorithm.
Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Aaron Bauer Winter 2014.
Union-find Algorithm Presented by Michael Cassarino.
Union Find ADT Data type for disjoint sets: makeSet(x): Given an element x create a singleton set that contains only this element. Return a locator/handle.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Nicki Dell Spring 2014.
CSE373: Data Structures & Algorithms Lecture 10: Implementing Union-Find Dan Grossman Fall 2013.
MA/CSSE 473 Days Answers to student questions Prim's Algorithm details and data structures Kruskal details.
Amortized Analysis and Heaps Intro David Kauchak cs302 Spring 2013.
MA/CSSE 473 Day 37 Student Questions Kruskal Data Structures and detailed algorithm Disjoint Set ADT 6,8:15.
Tirgul 12 Solving T4 Q. 3,4 Rehearsal about MST and Union-Find
Graphs A New Data Structure
CSE332: Data Abstractions Lecture 7: AVL Trees
CSE 373: Data Structures and Algorithms
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Data Structures for Disjoint Sets
May 3rd – Hashing & Graphs
BackTracking CS255.
Btrees Deletion.
Recitation 10 Prelim Review.
CSE 373, Copyright S. Tanimoto, 2001 Up-trees -
Chapter 8 Disjoint Sets and Dynamic Equivalence
Disjoint Sets Chapter 8.
Week 11 - Friday CS221.
Hashing Exercises.
Data Structures: Disjoint Sets
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
CMSC 341 Disjoint Sets Based on slides from previous iterations of this course.
i206: Lecture 14: Heaps, Graphs intro.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Linda Shapiro Spring 2016.
Disjoint Set Neil Tang 02/23/2010
Disjoint Set Neil Tang 02/26/2008
CS200: Algorithm Analysis
Data Structures & Algorithms Union-Find Example
B- Trees D. Frey with apologies to Tom Anastasio
Graphs.
CSE 373 Data Structures and Algorithms
Data Structures & Algorithms Union-Find Example
CSE373: Data Structures & Algorithms Lecture 5: AVL Trees
CSE 332: Data Abstractions Union/Find II
CSE 332: Data Abstractions AVL Trees
B- Trees D. Frey with apologies to Tom Anastasio
Advanced Implementation of Tables
Graphs and Algorithms (2MMD30)
CSE373: Data Structures & Algorithms Implementing Union-Find
Disjoint Sets DS.S.1 Chapter 8 Overview Dynamic Equivalence Classes
Amortized Analysis and Heaps Intro
Running Time Analysis Union is clearly a constant time operation.
Graphs.
Graphs.
Richard Anderson Spring 2016
Graphs.
Graphs.
Tree A tree is a data structure in which each node is comprised of some data as well as node pointers to child nodes
Lecture 21 Amortized Analysis
CSE 326: Data Structures Lecture #10 B-Trees
CSE 373: Data Structures and Algorithms
Lecture 27: Array Disjoint Sets
Lecture 26: Array Disjoint Sets
Disjoint Set Operations: “UNION-FIND” Method
Presentation transcript:

Union-Find

Quick Review: Connected Components Essentially answers the question “which nodes are reachable from here”?

Connected Components Not always obvious whether two nodes are in the same connected component Not always obvious what the connected components even are! Depends a lot on graph representation

Union-Find Given a set of elements S, a partition of S is a set of nonempty subsets of S such that every element of S is in exactly one of the subsets If your elements are nodes in a graph, the partitions can correspond to connected components But can be used for other things! A Union-Find (or Disjoint-Set) data structure is one that efficiently keeps track of these partitions The Union-Find data structure supports two operations: Union Merge two sets of the partition into one Find Identify which partition a given input is a part of

Representing Trees as Arrays Key fact: by definition, every tree node has exactly one parent (except the root) Big idea: assign each node to an array index, and store the index of the parent 1 2 3 4 Notice: only the root node has its value equal to its index 2

Representing Forests as Arrays 5 1 2 6 7 8 9 3 4 Again, only the roots have values equal to their array indices 1 2 5 8

Representing Forests as Arrays 5 1 2 6 7 8 9 3 4 How does this help with Union-Find? 1 2 5 8

Representing Forests as Arrays 5 1 2 6 7 8 9 3 4 1 2 5 8

Representing Forests as Arrays 5 1 2 6 7 8 9 3 4 1 2 5 8

Representing Forests as Arrays 5 1 2 6 7 8 9 3 4 1 2 5 8

Representing Forests as Arrays 5 1 2 6 7 8 9 3 4 1 2 5 8

Representing Forests as Arrays 5 1 2 6 7 8 9 3 4 1 2 5 8

Representing Forests as Arrays 5 1 2 6 7 8 9 3 4 1 2 5 8

Representing Forests as Arrays 5 1 2 6 7 8 9 3 4 1 2 5 8

Union-Find Can take advantage of the tree-as-array representation to map each node to a unique “component ID” The ID is the root of the tree The Find method can be implemented by walking up the tree Using Find on two different nodes can tell you if they’re in the same partition Note: this representation isn’t perfect Doesn’t easily let you walk down the tree Not all graphs/connected components can neatly map into a forest without loss of edges In other words, good for Union-Find but not a silver bullet for graph representation

Representing Forests as Arrays 5 1 2 6 7 8 9 3 4 What about union? 1 2 5 8

Representing Forests as Arrays 1 2 5 3 4 6 7 8 9 1 2 5 8

Representing Forests as Arrays 1 2 5 3 4 6 7 8 9 1 2 5 8

Representing Forests as Arrays 1 2 5 3 4 6 7 8 9 1 2 5 8

Representing Forests as Arrays 1 2 5 3 4 6 7 8 9 1 2 5 8

Representing Forests as Arrays 1 2 5 3 4 6 7 8 9 1 2 5 8

Union-Find The Merge method is as easy as making the root of one tree a child of the root of another tree In our data structure, this just means changing a single value in the array Union may not be called on the roots, so generally Union requires calls to Find

Union-Find: Implementation #`s` is a list representing the partitions #`e` is the ID of a particular element def find(s, e): p = s[e] if e == p: return p return find(s, p)

Union-Find: Implementation #`s` is a list representing the partitions #`e1` and `e2 ` are element IDs in partitions we wish to merge def union(s, e1, e2): r1 = find(s, e1) r2 = find(s, e2) s[r2] = r1

Union-Find: Complexity Analysis Space: O(n) Since we need this new array to store the partitions Find worst-case runtime: O(n) Need to walk up tree No guarantee the tree is balanced Union worst-case runtime: O(n) Relies on Find, so can’t be any better Can we do better? For space, no For runtime, yes!

Union-Find: Improving Runtime Insight: for any given node, all we really care about is the root of its tree In other words, the nodes in between don’t matter Best-case scenario is a very “flat” tree Fewer “hops” during Find 1 2 3 4 1 2 3 4 1 2

Union-Find: Improving Runtime 1 2 3 4 5 6 Insight: Union is not commutative Better to keep resulting tree as “flat” as possible Or, impact as few nodes as possible 1 2 3 4 5 6

Union-Find: Improving Runtime Using first insight, we can improve Find As we walk up the tree, we can be rewriting parents of visited nodes to point directly to root Won’t improve first Find, but will improve all future ones “Improve what you use”, “improve as you go” Known as path-compression Using second insight, we can improve Union Store either size or rank along with nodes, so you can compare subtrees Choose the root of new tree to be the bigger/deeper tree Known as union-by-size or union-by-rank Both are reasonable, we’ll be looking at union-by-size Note: this means we now need to store both parent and size in each array cell

Union-Find: Improved Implementation #`s` is a list representation the partitions #`e` is the ID of a particular element def find(s, e): v = s[e] if e != v.parent: v.parent = find(s, v.parent) return v.parent

Union-Find: Improved Implementation #`s` is a list representation the partitions #`e1` and `e2 ` are element IDs in partitions we wish to merge def union(s, e1, e2): r1 = find(s, e1) r2 = find(s, e2) if r1 == r2: return if s[r1].size > s[r2].size: s[r2].parent = r1 s[r1].size += s[r2].size else: s[r1].parent = r2 s[r2].size += s[r1].size

Union-Find: Revised Complexity Analysis Space: still O(n) Storing size now, but still just an extra O(n) integers Runtime of Union is still dependent on runtime of Find So what’s the new runtime of Find/Union? Answer: almost O(1), amortized Note: “amortized” essentially means “smoothed out over many operations” Actual answer: O(⍺(n)), amortized

Digression: Ackermann Function if m = 0 A(m, n) = A(m - 1, 1) if m > 0 and n = 0 A(m - 1, A(m, n - 1)) if m > 0 and n > 0 This function grows REALLY FAST Example: A(4, 2) is 19,729 digits long If f(n) = A(n, n), then ⍺(n) = f-1(n) Known as the “inverse Ackermann function” ⍺(n) grows REALLY SLOW Example: ⍺(n) < 5 for literally any n that can be written in this physical universe OPTIONAL MATERIAL

Union-Find: Applications Image segmentation Used for self-driving cars - seriously! Cornell’s 2007 DARPA Urban Challenge car used this

Image Segmentation Every pixel is a node, every node has an edge to its eight neighbors Edge weights are distance in RGB space sqrt( (r1 - r2)2 + (g1 - g2)2 + (b1 - b2)2 ) Start with each node in its own partition Define Int(C) to be the edge of greatest weight in connected component C Called the “internal difference” Define T(C) to be k / |C|, where k is a constant The “threshold” Iterate through edge weights from least to greatest For edge (v1, v2): If v1 and v2 are already in the same connected component, remove the edge Merge connected components if w(v1, v2) < min(Int(C1) + T(C1), Int(C2) + T(C2)) OPTIONAL MATERIAL

Union-Find: Applications Optical Character Recognition (OCR) Similar to image segmentation: can find similarly colored components to be the characters Run character shapes through machine learning pipeline to match known character Or, potentially just use a lookup table if you know the font! Glossing over some details: Dealing with letters like “i” which are not connected components Dealing with “ligatures”