Download presentation
Presentation is loading. Please wait.
Published byBlake Lane Modified over 9 years ago
1
One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo
2
A brief history of this talk L 2 /L 2 foreach sparse recovery/compressed sensing http://www-stat.stanford.edu/~candes/stats330/index.shtml
3
The key technical problem Given the three shadows, what is the largest size of the original set of points ?
4
The key technical problem Highly trivial: 4 3 = 64Still trivial: 4 2 = 16Correct answer: 4 1.5 = 8
5
The key technical problem A B C |R|= k |T| =k |S|=k k 3/2 Loomis Whitney Algorithmic Loomis- Whitney?
6
An equivalent view A B C R T S A B C R S T Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and (c,a) in T
7
Overview of the talk A B C R S T
8
The take-away message Join algo http://welovetumblr.blogspot.com/2012/07/thor-is.html
9
Overview of the talk A B C R S T
10
(Database) Joins Codd Attributes/Nodes: [n] Relations/Hyperedges: e 1,…, e m [n] 1 1 2 2 3 3 4 4 5 5 Tables/Projections: R 1, …, R m Output all a = (a 1,..,a n ) s.t. a projected down to e i is in R i for every i in [m] Output all a = (a 1,..,a n ) s.t. a projected down to e i is in R i for every i in [m]
11
The triangle join query A B C R T S Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and (c,a) in T S A A B B C C R T
12
Bounding the output size Atserias Grohe Marx A A B B C C S R T Highly trivial bound: R S T Still trivial bound: R S Loomis-Whitney bound: R 1/2 S 1/2 T 1/2 ½ ½ ½ x y z AGM bound: R x S y T z x + z ≥ 1 x + y ≥ 1 y + z ≥ 1 A A B B C C x, y, z ≥ 0
13
Loomis Whitney ?
14
Algorithmic Loomis-Whitney Loomis-Whitney bound: R 1/2 S 1/2 T 1/2 A A B B C C S R T ½ ½ ½ R T S C C B B A A cc Goal: Count number of triangles There are R choices for edges in R There are d S (c) d T (c) choices for pairs of neighbors of c http://agilitrix.com/2011/03/red-pill-blue-pill/ T S C C B B A A cc d T (c) d S (c)
15
Algorithmic Loomis-Whitney Loomis-Whitney bound: R 1/2 S 1/2 T 1/2 Goal: Count number of triangles There are R choices for edges in R There are d S (c) d T (c) choices for pairs of neighbors of c Make this choice for every c in C Run time of algo= Σ c min( R,d S (c) d T (c) ) Run time of algo= Σ c min( R,d S (c) d T (c) ) R T S C C B B A A cc
16
Analyzing the algorithm Loomis Whitney bound: R ½ S ½ T ½ Σ c min( R, d S (c) d T (c) ) ≤ Σ c (R d S (c) d T (c) ) ½ = R ½ Σ c ( d S (c) ½ d T (c) ½ ) ≤ R ½ (Σ c d S (c)) ½ (Σ c d T (c)) ½ = R ½ S ½ T ½ R T S C C B B A A cc Cauchy Schwartz min(E,F) ≤ (E F) ½
17
? Atserias Grohe Marx
18
Same algorithm! AGM bound: R x S y T z Σ c min( R, d S (c) d T (c) ) ≤ Σ c R x (d S (c) d T (c) ) 1-x ≤ R x Σ c ( d S (c) y d T (c) z ) ≤ R x (Σ c d S (c)) y (Σ c d T (c)) z = R x S y T z R T S C C B B A A cc x + z ≥ 1 x + y ≥ 1 y + z ≥ 1 A A B B C C Hölder min(E,F) ≤ E x F 1-x
19
General Join Result Attributes/Nodes: [n] Relations/Hyperedges: e 1,…, e m [n] 1 1 2 2 3 3 4 4 5 5 Tables/Projections: R 1, …, R m x 1,..,x m be a fractional cover AGM bound: R 1 x1 … R m xm Our result: O(AGM + Input size) x1x1 x2x2 x3x3 x4x4 Provably worst-case optimal join algorithm
20
List recovery............. S1S1 S2S2 S3S3 SnSn ……………………… Output all codewords that agree with (all) the input lists S i subset of [q] ……………………… c1c1 c2c2 c3c3 cncn 20 Code C subset of [q] n Applications in expanders
21
An alternate view of joins A B C R S T Msg in [q] 3 Codeword in [q 2 ] 3......... RST Constant dimension Constant block length Large alphabet size Large input list size Constant dimension Constant block length Large alphabet size Large input list size
22
Overview of the talk A B C R S T
23
Sparse Recovery/Compressed Sensing Unknown To be designed Observed Decode Output k=2 Heavy Hitter Tail
24
Quantifying the approximation L2L2 ≤ C L 2
25
(Most of) rest of the talk
26
Designing the matrix Unknown To be designed Observed Decode Output k=2
27
Designing the matrix k=2 N m k-expander N m < ¼ (neighborhood) Measurement = + noise Heavy tail noise < ¼ (neighborhood) > ½ of the neighbors of have the “correct” value
28
Count-Sketch style algo k=2 N m Estimate = median of O(log N) values Output the top O(k) estimates O(N log N) decoding IndykRužić
29
We need a faster algorithm…
30
S Towards a sub-linear time algo Estimate=median value Output the top O(k) estimates in S O(|S| log N) decoding All we need to do is to compute a small S quikcly
31
Porat-Strauss Idea: Recursion! [N] {0,1} log N [√N] Solve in ~ √N time
32
The problem we now need to solve Elements of S Geometrically… k k ? Output size ~ k 2 Overall running time ~ √N + k 2 Not sub- linear for k > √N Use a table-look up to decrease the run time
33
Finally…
34
Slightly different recursion log N [N] [N ⅔ ] Geometric problem to solve Overall runtime k 3/2 + N 2/3
35
Our Results L 2 /L 2 sparse recovery with failure prob p Optimal k log(N/k) measurements * k 1+ε poly-log N decoding+space p ~ (N/k) -k/poly-log k Also prove tight lower bound of k log(N/k) + log(1/p)
36
One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo
37
Only two problems so far… A B C R S T
38
Albert Meyer ( via Dick Lipton ) "Prove it for n=3 and then let 3 go to infinity"
39
The 3 rd problem… Big (hyper)graph G http://pigeonsandplanes.com/2010/12/thoughts-on-net-neutrality.html 1 1 2 2 3 3 4 4 5 5 Small (hyper) graph H Compute all copies of H in G Our join algorithm gives a worst-case optimal algorithm for any constant-sized H Joins model many more problems, e.g. CSPs
40
The take-away message Join algo http://welovetumblr.blogspot.com/2012/07/thor-is.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.