Data Structures & Algorithms Union-Find Example Richard Newman
Steps to Develop an Algorithm Define the problem – model it Determine constraints Find or create an algorithm to solve it Evaluate algorithm – speed, space, etc. If algorithm isn’t satisfactory, why not? Try to fix algorithm Iterate until solution found (or give up)
Dynamic Connectivity Problem Given a set of N elements Support two operations: Connect two elements Given two elements, is there a path between them?
Example Connect (4, 3) Connect (3, 8) Connect (6, 5) Connect (9, 4) Connect (2, 1) Are 0 and 7 connected (No) Are 8 and 9 connected (Yes)
Example (con’t) Connect (5, 0) Connect (7, 2) Connect (6, 1) Connect (1, 0) Are 0 and 7 connected (Yes) Now consider a problem with 10,000 elements and 15,000 connections…
Modeling the Elements Various interpretations of the elements: Pixels in a digital photo Computers in a network Socket pins on a PC board Transistors in a VLSI design Variable names in a C++ program Locations on a map Friends in a social network …… Convenient to just number 0 to N-1 Use as array index, suppress details
Modeling the Connections Assume “is connected to” is an equivalence relation Reflexive: a is connected to a Symmetric: if a is connected to b, then b is connected to a Transitive: if a is connected to b, and b is connected to c, then a is connected to c
Connected Components A connected component is a maximal set of elements that are mutually connected (i.e., an equivalence set) {0} {1,2} {3,4,8,9} {5,6} {7}
Implementing the Operations Recall – connect two elements, and answer if two elements have a path between them Find: in which component is element a? Union: replace components containging elements a and b with their union Connected: are elements a and b in the same component?
Example Union(1,6) {0} {1,2} {3,4,8,9} {5,6} {7} {0} {1,2,5,6} {3,4,8,9} {7} Components?
Union-Find Data Type Goal: Design an efficient data structure for union-find Number of elements can be huge Number of operations can be huge Union and find operations can be intermixed public class UF UF int(N); voidunion(int a, int b); intfind(int a); booleanconnected(int a, int b); ;;
Dynamic Connectivity Client Read in number of elements N from stdin Repeat: –Read in pair of integers from stdin –If not yet connected, connect them and print out pair read input int N while stdin is not empty read in pair of ints a and b if not connected (a, b) union(a, b) print out a and b ;;
Quick-Find Data Structure –Integer array id[] of length N –Interpretation: id[a] is the id of the component containing a i: id[i]:
Quick-Find Data Structure –Integer array id[] of length N –Interpretation: id[a] is the id of the component containing a Find: what is the id of a? Connected: do a and b have the same id? Union: Change all the entries in id that have the same id as a to be the id of b.
Quick-Find Union(1,6) i: id: i: id: It works – so is there a problem? Well, there may be many values to change, and many to search!
Quick-Find Quick-Find operation times –Initialization takes time O(N) –Union takes time O(N) –Find takes time O(1) –Connected takes time O(1) Union is too slow – it takes O(N 2 ) array accesses to process N union operations on N elements
Quadratic Algos Do Not Scale! Rough Standards (for now) –10 9 operations per second –10 9 words of memory –Touch all words in 1 second (+/- truism since 1950!) Huge problem for Quick-Find: 10 9 union commands on 10 9 elements Takes more than operations This is 30+ years of computer time!
Quadratic Algos Do Not Scale! They do not keep pace with technology New computer may be 10x as fast But it has 10x as much memory Want to solve problems 10x as big With quadratic algorithm, it takes… … 10 x as long!!!
Quick-Union Data Structure –Integer array id[] of length N –Interpretation: id[a] is the parent of a –Component is root of a = id[id[…id[a]…]] (fixed point) i: id[i]:
Quick-Union Data Structure – What is root of tree of a? – Do a and b have the same root? – Set id of root of b’s tree to be root of a’s tree i: id[i]: –Find: –Connected: –Union:
Quick-Union i: id[i]: –Find 9 –Connected 8, 9: –Union 7,5 5 Only ONE value changes! = FAST
Quick-Union Quick-Union operation times (worst case) –Initialization takes time O(N) –Union takes time O(N) (must find two roots) –Find takes time O(N) –Connected takes time O(N) Now union AND find are too slow – it takes O(N 2 ) array accesses to process N operations on N elements
Quick-Find/Quick-Union Observations: Problem with Quick-Find is unions –May take N array accesses –Trees are flat, but too expensive to keep them flat! Problem with Quick-Union –Trees may get tall –Find (and hence, connected and union) may take N array accesses
Weighted Quick-Union Make Quick-Union trees stay short! Keep track of tree size Join smaller tree into larger tree –May alternatively do union by height/rank –Need to keep track of “weight” b a a b Quick-Union may do this But we always want this
Weighted Quick-Union Weighted Quick-Union operation times –Initialization takes time O(N) –Union takes time O(1) (given roots) –Find takes time O(depth of a) –Connected takes time O(max {depth of a, b}) Proposition: Depth of any node x is at most lg N Pf: What causes depth of x to increase?
Weighted Quick-Union Proposition: Depth of any node x is at most lg N Pf: What causes depth of x to increase? Only union! And if x is in smaller tree. So x’s tree must at least double in size each time union increases x’s depth Which can happen at most lg N times. (Why?)
Next – Lecture 3 Read Chapter 2 Empirical analysis Asymptotic analysis of algorithms Basic recurrences