Binary Search on a Tree Shay Mozes (Brown University) Krzysztof Onak (MIT) Oren Weimann (MIT)
Locating differences – Binary Search on a Tree ? ? Directories checksums = 4
Locating differences – Binary Search on a Tree ? ? Directories checksums = 5
Search Startegy search strategy = decision tree Optimal strategy = shallowest decision tree a (a,c)? (a,b)? (c,e)? a c (a,d)? b d (c,f)? (e,g)? e f g b d c e f Binary search – tree is a line. Decision tree – complete binary tree g
Searching in trees and posets Related Work Tree edge ranking Searching in trees and posets IRV [Disc. Appl. Math. 1991]: 2-approx. in O(nlogn) TGS [Algorithmica 1995]: O(n3logn) LY [SODA 1998]: O(n) BFN [SODA 1997]: O(n4log3n) LN [ENDM 2001]: 2-approx. in O(nlogn) OP [FOCS 2006]: O(n3) This paper: O(n) Applications: Ours: filesystem sync, bug detection Ranking: VLSI design, matrix factorization BFN – Ben Asher Farchi Newman LN – Laber Nogueira OP = Onak Parys IRV – Iyer, Ratliff, Vijayan TGS – de la Torre, Greenlaw, Schaffer LY – Lam, Yue
Strategy Function f : E → {1,2,3,…} If f(e1 ) = f(e2 ), then on the path from e1 to e2 there is e3 with f(e3 ) > f(e1 ) = f(e2 ) strategy function bounded by k decision tree of height k a (a,c)? (a,b)? (c,e)? a c (a,d)? b d (c,f)? (e,g)? e f g 3 2 1 Intuition: f(e) = k, at most k more queries to go. b d c e f g
Solution Approach Given tree Find strategy function with the least maximum Convert into a decision tree c a d f e g b (a,c)? (a,b)? (c,e)? a c (a,d)? b d (c,f)? (e,g)? e f g 3 2 1
Visibility e is visible from vertex v if on path from v to e there is no value greater than e Lexicographic order on visibility sequences e.g.: (3,2,1) > (3,1) 3,1 a 1 3 3,2,1 d c 2 1 e f 1 g
Bottom-Up Approach Given visibility sequences at children of v Extend to minimal visibility sequence at v Theorem [OP, de la Torre et al.]: Minimal extensions accumulate to an optimal solution Given strategy functions at children of v v 2 3 3 2 c d f e g 2 1 2 1 1
Valid Extension . . . An extension assigns all f(ei)’s v f(e1)? f(e2)? f(ek)? s1 s2 sk 4 1 4 1 1 . . .
Valid Extension . . . An extension assigns all f(ei)’s f(ei)= f(ej) v 3 3 f(ek)? s1 s2 sk 4 1 4 1 1 . . .
Valid Extension . . . An extension assigns all f(ei)’s f(ei)= f(ej) f(ei) is not in si v 4 f(e2)? f(ek)? s1 s2 sk 4 1 4 1 1 . . .
Valid Extension . . . An extension assigns all f(ei)’s f(ei)= f(ej) f(ei) is not in si f(ei) is in sj f(ej) > f(ei) v 3 f(e2)? 4 s1 s2 sk 4 1 4 1 1 . . .
Valid Extension . . . An extension assigns all f(ei)’s f(ei)= f(ej) f(ei) is not in si f(ei) is in sj f(ej) > f(ei) u is in si and sj max{f(ei), f(ei)} > u v 2 3 f(ek)? s1 s2 sk 4 1 4 1 1 . . .
Algorithm Outline . . . v f(e1)? f(e2)? f(ek)? s1 s2 sk free values 1 4 1 1 1 . . . free values 1 2 3 4 5 6
Algorithm Outline . . . set u = max{si} u v f(e1)? f(e2)? f(ek)? s1 s2 sk 4 1 1 1 . . . u is most problematic u free values 1 2 3 4 5 6
Algorithm Outline . . . set u = max{si} while not all edges assigned u v f(e1)? f(e2)? f(ek)? s1 s2 sk 4 1 1 1 . . . u free values 1 2 3 4 5 6
Algorithm Outline . . . set u = max{si} while not all edges assigned u if u appears once mark u as not free, move to next largest u v f(e1)? f(e2)? f(ek)? s1 s2 sk 4 1 1 1 . . . - u is not problematic cause appears only once, but is not free to use again - Next largest u that actually appears in some s_i u free values 1 2 3 4 5 6
Algorithm Outline . . . set u = max{si} while not all edges assigned u if u appears once mark u as not free, move to next largest u v f(e1)? f(e2)? f(ek)? s1 s2 sk 4 1 1 1 . . . u free values 1 2 3 5 6
Algorithm Outline . . . set u = max{si} while not all edges assigned u if u appears once mark u as not free, move to next largest u otherwise: v f(e1)? f(e2)? f(ek)? s1 s2 sk 4 1 1 1 . . . u free values 1 2 3 5 6
Algorithm Outline . . . set u = max{si} while not all edges assigned w if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u v f(e1)? f(e2)? f(ek)? s1 s2 sk 4 1 1 1 . . . w is smallest value that can resolve u w u w free values 1 2 3 5 6
Algorithm Outline . . . set u = max{si} while not all edges assigned w if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w v f(e1)? f(e2)? f(ek)? s1 s2 sk 4 1 1 1 . . . Its best to assign w to the visibility sequence s.t w makes many values in it disappear – -NOT NECCESARILY ONE OF THE EDGES THAT u APPEARS IN w u w free values 1 2 3 5 6 Sj
Algorithm Outline . . . set u = max{si} while not all edges assigned w if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free v f(e1)? f(e2)? f(ek)? s1 s2 sk 4 1 1 1 . . . w u w free values 1 3 5 6 Sj
Algorithm Outline . . . set u = max{si} while not all edges assigned w if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w v 2 f(e2)? f(ek)? s1 s2 sk 4 1 1 1 . . . w u w free values 1 3 5 6 Sj
Algorithm Outline . . . set u = max{si} while not all edges assigned w if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w mark all Sj values between u and w as free v 2 f(e2)? f(ek)? s1 s2 sk 4 1 1 1 . . . No such values since no integer between 1 and 2 w u w free values 1 3 5 6 Sj
Algorithm Outline . . . set u = max{si} while not all edges assigned w if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w mark all Sj values between u and w as free remove all values < w from Sj v 2 f(e2)? f(ek)? s1 s2 sk 4 1 1 . . . w u w free values 1 3 5 6 Sj
Algorithm Outline . . . set u = max{si} while not all edges assigned u if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w mark all Sj values between u and w as free remove all values < w from Sj v 2 f(e2)? f(ek)? s1 s2 sk 4 1 1 . . . u free values 1 3 5 6
Algorithm Outline . . . set u = max{si} while not all edges assigned u if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w mark all Sj values between u and w as free remove all values < w from Sj v 2 f(e2)? f(ek)? s1 s2 sk 4 1 1 . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! u w free values 1 3 5 6 Sj
Algorithm Outline . . . set u = max{si} while not all edges assigned u if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w mark all Sj values between u and w as free remove all values < w from Sj v 2 f(e2)? f(ek)? s1 s2 sk 4 1 1 . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! u w free values 1 5 6 Sj 31
Algorithm Outline . . . set u = max{si} while not all edges assigned u if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w mark all Sj values between u and w as free remove all values < w from Sj v 2 3 f(ek)? s1 s2 sk 4 1 1 . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! u w free values 1 5 6 Sj 32
Algorithm Outline . . . set u = max{si} while not all edges assigned u if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w mark all Sj values between u and w as free remove all values < w from Sj v 2 3 f(ek)? s1 s2 sk 4 1 . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! u w free values 1 5 6 Sj 33
Algorithm Outline . . . set u = max{si} while not all edges assigned u if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w mark all Sj values between u and w as free remove all values < w from Sj v 2 3 f(ek)? s1 s2 sk 4 1 . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! u free values 1 5 6 34
Algorithm Outline . . . set u = max{si} while not all edges assigned u if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w mark all Sj values between u and w as free remove all values < w from Sj and u = 0 v 2 3 f(ek)? s1 s2 sk 4 1 . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! u free values 5 6 35
Algorithm Outline . . . set u = max{si} while not all edges assigned w if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w mark all Sj values between u and w as free remove all values < w from Sj and u = 0 v 2 3 f(ek)? w s1 s2 sk 4 1 . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! u w free values 5 6 Sj 36
Algorithm Outline . . . set u = max{si} while not all edges assigned w if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w mark all Sj values between u and w as free remove all values < w from Sj and u = 0 v 2 3 f(ek)? w s1 s2 sk 4 1 . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! u w free values 6 Sj 37
Algorithm Outline . . . set u = max{si} while not all edges assigned w if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w (free previous value) mark all Sj values between u and w as free remove all values < w from Sj and u = 0 v 5 3 f(ek)? w s1 s2 sk 4 1 . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! u w free values 2 6 Sj 38
Algorithm Outline . . . set u = max{si} while not all edges assigned w if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w (free previous value) mark all Sj values between u and w as free remove all values < w from Sj and u = 0 v 5 3 f(ek)? w s1 s2 sk 1 . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! u w free values 2 4 6 Sj 39
Algorithm Outline . . . set u = max{si} while not all edges assigned u if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w (free previous value) mark all Sj values between u and w as free remove all values < w from Sj and u = 0 v 5 3 f(ek)? s1 s2 sk 1 . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! u free values 2 4 6 40
Algorithm Outline . . . set u = max{si} while not all edges assigned w if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w (free previous value) mark all Sj values between u and w as free remove all values < w from Sj and u = 0 v 5 3 f(ek)? s1 s2 sk 1 . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! w u w free values 2 4 6 Sj 41
That’s it! Algorithm Outline . . . set u = max{si} while not all edges assigned if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w (free previous value) mark all Sj values between u and w as free remove all values < w from Sj and u = 0 v 5 3 2 s1 s2 sk . . . Here you can see that the maximal sequence w.r.t w is one that does not contain u ! w u w free values 4 6 Sj 42
That’s it! Algorithm Outline set u = max{si} while not all edges assigned if u appears once mark u as not free, move to next largest u otherwise: w = smallest free value > u Sj = any maximal sequence w.r.t w mark w as not free set current f(ej) = w (free previous value) mark all Sj values between u and w as free remove all values < w from Sj and u = 0 v s1 s1 5 3 2
Running Time v s1 s2 sk 4 1 1 1 . . .
Running Time Easy in O(|S1| + |S2| +…+ |Sk|) per vertex O(n2) total v s1 s2 sk 4 1 1 1 But we can actually do it in |s_1|+…+|s_k| yielding n^2 time complexity ! . . . 45
Running Time Easy in O(|S1| + |S2| +…+ |Sk|) per vertex O(n2) total But, |S1| + |S2| +…+ |Sk| is not a lower bound! v s1 s2 sk 4 1 1 1 But we can actually do it in |s_1|+…+|s_k| yielding n^2 time complexity ! . . . 46
Running Time Easy in O(|S1| + |S2| +…+ |Sk|) per vertex O(n2) total But, |S1| + |S2| +…+ |Sk| is not a lower bound! in many cases, the largest values of the largest visibility sequence are unchanged at v itself v s1 s2 sk 4 1 1 1 But we can actually do it in |s_1|+…+|s_k| yielding n^2 time complexity ! . . . 47
Σ Linear Running Time . . . q(v) = |S2| +…+ |Sk| k(v) = #v’s children t(v) = largest value that appears in Sv but not in S1 Theorem: an extension can be computed in time O( k(v)+q(v)+t(v) ) Theorem: k(v)+q(v)+t(v) = O(n) Differs from [LY98]: different data structures No bit tricks O(n) decision tree construction v Σ v s1 s2 sk 4 1 1 1 Different than Lam and Yue’s solution. Many details omitted . . .
From Strategy Function to Decision Tree in O(n) Time
From Strategy Function to Decision Tree in O(n) Time (a,c)? 2 1 a c 3 b d c (a,b)? (c,e)? 2 1 a b c e e f 1 (a,d)? b (c,f)? (e,g)? a d c f e g g a d c f e g
From Strategy Function to Decision Tree in O(n) Time (a,c)? For all edges e let s = visibility sequence at bottom(e) if s contains no values smaller than f(e) set bottom(e) as the solution when the query on e returns bottom(e) else, let v1 <...< vk < f(e) in s, let ei be the edge vi is assigned to set ek as the solution when the query on e returns bottom(e) for every 1≤ i <k set ei as the solution when the query on ei+1 returns top(ei+1) set top(e1) as the solution when the query on e1 returns top(e1) 2 1 a c 3 b d c (a,b)? (c,e)? 2 1 a b c e e f 1 (a,d)? b (c,f)? (e,g)? a d c f e g g a d c f e g
Thank You!