Presentation is loading. Please wait.

Presentation is loading. Please wait.

One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo.

Similar presentations


Presentation on theme: "One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo."— Presentation transcript:

1 One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo

2 A brief history of this talk L 2 /L 2 foreach sparse recovery/compressed sensing http://www-stat.stanford.edu/~candes/stats330/index.shtml

3 The key technical problem Given the three shadows, what is the largest size of the original set of points ?

4 The key technical problem Highly trivial: 4 3 = 64Still trivial: 4 2 = 16Correct answer: 4 1.5 = 8

5 The key technical problem A B C |R|= k |T| =k |S|=k k 3/2 Loomis Whitney Algorithmic Loomis- Whitney?

6 An equivalent view A B C R T S A B C R S T Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and (c,a) in T

7 Overview of the talk A B C R S T

8 The take-away message Join algo http://welovetumblr.blogspot.com/2012/07/thor-is.html

9 Overview of the talk A B C R S T

10 (Database) Joins Codd Attributes/Nodes: [n] Relations/Hyperedges: e 1,…, e m [n] 1 1 2 2 3 3 4 4 5 5 Tables/Projections: R 1, …, R m Output all a = (a 1,..,a n ) s.t. a projected down to e i is in R i for every i in [m] Output all a = (a 1,..,a n ) s.t. a projected down to e i is in R i for every i in [m]

11 The triangle join query A B C R T S Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and (c,a) in T S A A B B C C R T

12 Bounding the output size Atserias Grohe Marx A A B B C C S R T Highly trivial bound: R  S  T Still trivial bound: R  S Loomis-Whitney bound: R 1/2  S 1/2  T 1/2 ½ ½ ½ x y z AGM bound: R x  S y  T z x + z ≥ 1 x + y ≥ 1 y + z ≥ 1 A A B B C C x, y, z ≥ 0

13 Loomis Whitney ?

14 Algorithmic Loomis-Whitney Loomis-Whitney bound: R 1/2  S 1/2  T 1/2 A A B B C C S R T ½ ½ ½ R T S C C B B A A cc Goal: Count number of triangles There are R choices for edges in R There are d S (c)  d T (c) choices for pairs of neighbors of c http://agilitrix.com/2011/03/red-pill-blue-pill/ T S C C B B A A cc d T (c) d S (c)

15 Algorithmic Loomis-Whitney Loomis-Whitney bound: R 1/2  S 1/2  T 1/2 Goal: Count number of triangles There are R choices for edges in R There are d S (c)  d T (c) choices for pairs of neighbors of c Make this choice for every c in C Run time of algo= Σ c min( R,d S (c)  d T (c) ) Run time of algo= Σ c min( R,d S (c)  d T (c) ) R T S C C B B A A cc

16 Analyzing the algorithm Loomis Whitney bound: R ½  S ½  T ½ Σ c min( R, d S (c)  d T (c) ) ≤ Σ c (R  d S (c)  d T (c) ) ½ = R ½  Σ c ( d S (c) ½  d T (c) ½ ) ≤ R ½  (Σ c d S (c)) ½  (Σ c d T (c)) ½ = R ½  S ½  T ½ R T S C C B B A A cc Cauchy Schwartz min(E,F) ≤ (E  F) ½

17 ? Atserias Grohe Marx

18 Same algorithm! AGM bound: R x  S y  T z Σ c min( R, d S (c)  d T (c) ) ≤ Σ c R x  (d S (c)  d T (c) ) 1-x ≤ R x  Σ c ( d S (c) y  d T (c) z ) ≤ R x  (Σ c d S (c)) y  (Σ c d T (c)) z = R x  S y  T z R T S C C B B A A cc x + z ≥ 1 x + y ≥ 1 y + z ≥ 1 A A B B C C Hölder min(E,F) ≤ E x  F 1-x

19 General Join Result Attributes/Nodes: [n] Relations/Hyperedges: e 1,…, e m [n] 1 1 2 2 3 3 4 4 5 5 Tables/Projections: R 1, …, R m x 1,..,x m be a fractional cover AGM bound: R 1 x1  …  R m xm Our result: O(AGM + Input size) x1x1 x2x2 x3x3 x4x4 Provably worst-case optimal join algorithm

20 List recovery............. S1S1 S2S2 S3S3 SnSn ……………………… Output all codewords that agree with (all) the input lists S i subset of [q] ……………………… c1c1 c2c2 c3c3 cncn 20 Code C subset of [q] n Applications in expanders

21 An alternate view of joins A B C R S T Msg in [q] 3 Codeword in [q 2 ] 3......... RST Constant dimension Constant block length Large alphabet size Large input list size Constant dimension Constant block length Large alphabet size Large input list size

22 Overview of the talk A B C R S T

23 Sparse Recovery/Compressed Sensing Unknown To be designed Observed Decode Output k=2 Heavy Hitter Tail

24 Quantifying the approximation L2L2 ≤ C  L 2

25 (Most of) rest of the talk

26 Designing the matrix Unknown To be designed Observed Decode Output k=2

27 Designing the matrix k=2 N m k-expander N m < ¼ (neighborhood) Measurement = + noise Heavy tail noise < ¼ (neighborhood) > ½ of the neighbors of  have the “correct” value

28 Count-Sketch style algo k=2 N m Estimate = median of O(log N) values Output the top O(k) estimates O(N log N) decoding IndykRužić

29 We need a faster algorithm…

30 S Towards a sub-linear time algo Estimate=median value Output the top O(k) estimates in S O(|S| log N) decoding All we need to do is to compute a small S quikcly

31 Porat-Strauss Idea: Recursion! [N] {0,1} log N [√N] Solve in ~ √N time

32 The problem we now need to solve Elements of S Geometrically… k k ? Output size ~ k 2 Overall running time ~ √N + k 2 Not sub- linear for k > √N Use a table-look up to decrease the run time

33 Finally…

34 Slightly different recursion log N [N] [N ⅔ ] Geometric problem to solve Overall runtime k 3/2 + N 2/3

35 Our Results L 2 /L 2 sparse recovery with failure prob p Optimal k log(N/k) measurements * k 1+ε poly-log N decoding+space p ~ (N/k) -k/poly-log k Also prove tight lower bound of k log(N/k) + log(1/p)

36 One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo

37 Only two problems so far… A B C R S T

38 Albert Meyer ( via Dick Lipton ) "Prove it for n=3 and then let 3 go to infinity"

39 The 3 rd problem… Big (hyper)graph G http://pigeonsandplanes.com/2010/12/thoughts-on-net-neutrality.html 1 1 2 2 3 3 4 4 5 5 Small (hyper) graph H Compute all copies of H in G Our join algorithm gives a worst-case optimal algorithm for any constant-sized H Joins model many more problems, e.g. CSPs

40 The take-away message Join algo http://welovetumblr.blogspot.com/2012/07/thor-is.html


Download ppt "One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo."

Similar presentations


Ads by Google