I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus
The Union-Find Problem A universe of N elements: x 1, x 2, …, x N Initially N singleton sets: {x 1 }, {x 2 }, …, {x N } Each set has a representative Maintain the partition under –Union( x i, x j ) : Joins the sets containing x i and x j –Find( x i ) : Returns the representative of the set containing x i
The Solution d bja eg h fl n m i srczk p representatives d bja eg h fl n m Union(d, h) : link-by-rank d bja eg h fl n Find(n) : path compression m
Complexity O(N α(N)) for a sequence of N union and find operations [Tarjan 75] – α() : Inverse Ackermann function (very slow!) –Optimal in the worst case [Tarjan79, Fredman and Saks 89] Batched (Off-line) version –Entire sequence known in advance –Can be improved to linear on RAM [Gabow and Tarjan 85] –Not possible on a pointer machine [Tarjan79]
Simple and Good, as long as … The entire data structure fits in memory
The I/O Model Main memory of size M Disk of infinite size One I/O transfers B items between memory and disk
Sources of “Non-Locality” Two operands in a union Nodes on a leaf-to-root path Operands in consecutive operations –Cannot remove for the on-line case Need to eliminate all of them in order to get less than one I/O per operation!
Our Results An I/O-efficient algorithm for the batched union-find problem using O(sort( N )) = O( N/B log M/B (N/B) ) I/Os –Same as sorting –optimal in the worst case A practical algorithm using O(sort( N ) log(N/M) ) I/Os –Implemented Applications to terrain analysis –Topological persistence : O(sort( N )) I/Os Implemented –Contour trees : O(sort( N )) I/Os
I/O-Efficient Batched Union-Find Assumption: No redundant unions –Each union must join two different sets –Will remove later Two-stage algorithm –Convert to interval union-find Compute an order on the elements s.t. each union joins two adjacent sets –Solve batched interval union-find
Union Tree r ab cdef ghi 1: Union(d, g) 2: Union(a, c) 3: Union(r, b) 4: Union(a, e) 5: Union(e, i) 6: Union(r, a) 7: Union(a, d) g 8: Union(d, h) r 9: Union(b, f) r ab cde f g h i Equivalent union trees
Transforming the Union Tree r ab cdef ghi r ab cdef g h i r ab c d efg h i r ab c d e f g h i Weights along root-to-leaf path decrease
Formulating as a Batched Problem r ab cdef ghi r ab c d e f g h i For each edge, find the lowest ancestor edge with a higher weight
Cast in a Geometry Setting r ab cdef ghi Euler Tour In O(sort( N )) I/Os [Chiang et al. 95] x : weight y : positions in the tour
Cast in a Geometry Setting r ab cdef ghi For each edge, find the lowest ancestor edge with a higher weight For each segment, find the shortest segment above and containing it
Distribution Sweeping M/B vertical slabs checked here checked recursively Total cost: O(sort( N ))
In-Order Traversal r ab c d e f g h i Weights along root-to-leaf path decrease At u, with child u 1,…, u k (in increasing order of weight) 1.Recursively visit subtree at u 1 2.Return u 3.For i=2,…, k Recursively visit subtree at u i br 8 aceigdhf Claim: this traversal produces the right order
Solving Interval Union-Find Union: x : two operands y : time stamp Find: x : operand y : time stamp Four instances of batched ray shooting: O(sort( N ))
Handling Redundant Unions Union tree becomes a graph Compute the minimum spanning tree –O(sort( N )) I/Os (randomized) [Chiang et al. 95] O(sort( N ) loglog B ) I/Os (deterministic) [Arge et al. 04] –Deterministic O(sort( N )) I/Os if graph is planar –Only MST edges are non-redundant
A Practical Algorithm Previous algorithm too complicated –2 Euler tours –4 instances of batched ray shooting –MST A simple and practical algorithm –Divide-and-conquer –O(sort( N ) log(N/M) ) I/Os –Implemented
Applications 1.Topological Persistence 2.Contour Trees
Topological Persistence
Formulated as Batched Union-Find Represented as a triangulated mesh Consider minimum-saddle pairs When reach –A minimum or maximum: do nothing –A regular poin u : Issue union( u,v ) for a lower neighbor v –A saddle u : let v and w be nodes from u ’s two connected pieces in its lower link Issue: find( v ), find( w ), union( u,v ), union( u,w ) lower link
Contour Trees
Previous Results Directly maintain contours –O( N log N ) time [van Kreveld et al. 97] –Needs union-split-find for circular lists –Do not extend to higher dimensions Two sweeps by maintaining components, then merge –O( N log N ) time [Carr et al. 03] –Extend to arbitrary dimensions
Join Tree and Split Tree Join tree Split tree Qualified nodes Join tree Split tree
Final Contour Tree Join tree Split tree Contour tree Hard to BATCH!
Another Characterization Join tree Split tree Contour tree u v w u v w u u w Let w be the highest node that is a descendant of v in join tree and ancestor of u in split tree, (u, w) is a contour tree edge Now can BATCH!
Experiment 1: Random Union-Find
Experiment 2: Topological Persistence on Terrain Data Neuse River Basin of NC
Experiment 2: Topological Persistence on Terrain Data
Summary An I/O-efficient algorithm for the batched union-find problem using O(sort( N )) = O( N/B log M/B (N/B) ) I/Os –optimal in the worst case A practical algorithm using O(sort( N ) log(N/M) ) I/Os Applications to terrain analysis –Topological persistence : O(sort( N )) I/Os –Contour trees : O(sort( N )) I/Os Open Question: On-line case –Can we get below O(N α(N)) I/Os?
Thank you!