Compsci 201, Union-Find Algorithms Owen Astrachan Jeff Forbes October 6, 2017 10/6/17 Compsci 201, Fall 2017, Union-Find
K is for … K-means Key Pairs Knuth, Donald Unsupervised learning can help predict a user’s preferences or identify users with similar properties Key Pairs Public & private key make encryption happen Knuth, Donald Wrote The Art of Computer Programming 9/29/17 Compsci 201, Fall 2017, Analysis
CompSci 201, Fall 2017, Union-Find APT Quiz Assess your ability to write code to solve problems in a more natural environment 2.5 hours from opening to submission Restricted collaboration policy Do not discuss Quiz with anyone other than course personnel Do not use online resources Post private posts on Piazza 10/6/17 CompSci 201, Fall 2017, Union-Find
It’s time for the Percolator!
CompSci 201, Fall 2017, Union-Find Towards Percolation A model for physical systems Pour liquid on top of porous material. Will it reach the bottom? Applications: modeling flow of electricity, spread of forest fires, gas flow, … System percolates iff top and bottom are connected by open sites. 10/6/17 CompSci 201, Fall 2017, Union-Find
Random Percolation Given an N-by-N system where each site is open with probability p, what is the probability that system percolates? Open question in statistical physics Take a computational approach Monte Carlo Simulation p = 0.3 (does not percolate) p = 0.4 (does not percolate) p = 0.5 (does not percolate) p = 0.6 (percolates) p = 0.7 (percolates) 10/6/17 CompSci 201, Fall 2017, Union-Find
CompSci 201, Fall 2017, Union-Find Phase Transition For large N, there will be a sharp threshold p* p > p*: almost certainly percolates. p < p*: almost certainly does not percolate. 10/6/17 CompSci 201, Fall 2017, Union-Find
Compsci 201, Fall 2017, Analysis+Markov Finding the Threshold Initialize N-by-N grid of sites as blocked Randomly open sites until system percolates Percentage of open sites gives an estimate of p* 9/22/17 Compsci 201, Fall 2017, Analysis+Markov
Modeling Percolation How to check whether an N-by-N system percolates? Create an object for each site and name them 0 to N 2 – 1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 N = 5 open blocked
System Percolates? virtual top site 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 N = 5 top row bottom row virtual bottom site open site full site Percolates iff virtual top site is connected to virtual bottom site. blocked site
Efficient Algorithms – Union Find Steps to developing a usable algorithm Model the problem Find an algorithm to solve it Analyze Optimal? If not, improve Fast enough? Fits in memory? Iterate until satisfied Union Find Example: Connect the Dots! N objects, 2 operations: Connect 2 objects Path connecting the two objects? 1 2 3 4 5 6 7 8 9
Union Find Union Find Example: Connect the Dots! 1 2 3 4 5 6 7 8 9 0 & 7 connected? No 8 & 9 connected? Yes Connect 5 & 0 Connect 7 & 2 Connect 6 & 1 Connect 1 & 0 Union Find Example: Connect the Dots! N objects, 2 operations: Connect 2 objects Path connecting the two objects? 1 2 3 4 5 6 7 8 9
Is there a path that connects p & q? Maze Connectivity p Is there a path that connects p & q? q CompSci 201
Connectivity is an Equivalence Relation "is connected to" is an equivalence relation: Reflexive: p p Symmetric: if p q, then q p Transitive: if p q and q r, then p r Connected component Maximal set of objects that are mutually connected 1 2 3 3 connected components {0} {1 4 5} {2 3 6 7} 4 5 6 7
Union-Find API Goal: Design efficient data structure for union-find Many objects N Many operations M Union and find operations may be intermixed public interface IUnionFind{ void initialize(int N); // initialize with N objects // (0 to N – 1) void union(int p, int q); // add connection p to q int find(int p); // id component for p (0 to N–1) boolean connected(int p, int q); // p & q connected? }
Quick Find Data structure: Integer array id[N] Interpretation: id[p] is the id of the component iff id[p] contains p 1 2 3 4 5 6 7 8 9 id[] 0, 5, 6 connected 1, 2, 7 connected 3, 4, 8, 9 connected Find: What is the id of p? Connected: Do p and q have the same id? Union: Components with p & q, change all entries with id = id[p] to id[q]
Quick-find Union is Too Slow Union: Components with p & q, change all entries with id = id[p] to id[q] 1 8 N2 array accesses for N unions algorithm initialize union find connected quick-find N 1
Quick Union Data structure: Int array id[N] id[i] is parent of i Root of i is id[id[id[... id[i]...]]] 1 9 6 7 8 2 4 5 root of 3 is 9 parent of 3 is 4 3 keep going until it doesn’t change (union algorithm ensures no cycles) 1 2 3 4 5 6 7 8 9
Quick Union: Find & Connected Integer array id[N]: id[i] is parent of i & Root of i is id[id[id[... id[i]...]]] Find: What is the root of p? Connected: Do p and q have the same root? 1 9 6 7 8 root of 3 is 9 root of 5 is 6 different roots means 3 & 5 are not connected 2 4 5 q p 3
Quick Union: Union Union 3 & 5 id[N]: id[i] is parent of I; root of i is id[id[… id[i]...]] Union: To merge components containing p and q, set the id of p's root to the id of q's root 1 6 7 8 1 9 6 7 8 9 Union 3 & 5 5 q 2 4 5 q 2 4 p 3 1 2 3 4 5 6 7 8 9 p 3 only one value changes CompSci 201
Quick Union is Also Too Slow algorithm initialize union find connected quick-find N 1 Quick-union (worst case) Quick-find improvement Union too expensive (N array accesses) Trees are flat, but too expensive to keep them flat Quick-union improvement Trees can get tall Find/connected too expensive: possibly N array accesses WOTO: http://bit.ly/201-f17-1006-1
Improve? Weighted Quick-Union Modify quick-union to avoid tall trees Track: size of each tree (number of objects) Balance: link root of smaller tree to root of larger tree
1 is already “compressed” Path compression! 2 5 4 1 root Just after computing the root of p, set the id[] of each examined node to point to that root tree to root of larger tree 7 3 10 8 6 Examined Nodes 9, 6, 3, 1 p 12 11 9 x 1 is already “compressed”
Path compression! Examined Nodes 9, 6, 3 root p x 2 5 4 1 12 11 9 7 3 1 root p 12 11 9 7 3 10 8 6 x Examined Nodes 9, 6, 3 2424
Path compression! Examined Nodes 9, 6, 3 root p x 2 5 4 1 12 11 9 10 8 1 root p 12 11 9 10 8 6 x 7 3 Examined Nodes 9, 6, 3 2525
Path compression! Examined Nodes 9, 6, 3 root p x 2 5 4 1 12 11 9 10 8 1 root p 12 11 9 10 8 6 7 3 x Examined Nodes 9, 6, 3 2626
Path compression! 2 5 4 1 root p 12 11 9 10 8 6 7 3 x 2727
Scoreboard Weighted quick union and/or path compression leads to efficient algorithm order of growth for initialize + M union-find operations on a set of N objects Algorithm Worst-case time quick-find MN quick-union weighted QU N + M log N QU + path compression weighted QU + path compression N + M lg* N N lg* N 1 2 4 16 3 65536 265536 5 ∈N + M lg* function for reasonable N
Recap Percolation Union-Find (Efficient Algorithms) Quick Find Quick Union Improvements Balancing Trees Path Compression Reflect What’s clear? What’s still muddy? http://bit.ly/201-f17-reflect