Sublinear Algorithmic Tools 2

Sublinear Algorithmic Tools 2
Alex Andoni

Plan Dimension reduction Sketching
Application: Numerical Linear Algebra Sketching Application: Streaming Application: Nearest Neighbor Search and more… Dimension reduction: linear map 𝑆: ℝ 𝑛 → ℝ 𝑘 s.t: for any points 𝑝,𝑞∈ ℝ 𝑛 : Pr 𝑆 ||𝑆 𝑝 −𝑆(𝑞)|| ||𝑝−𝑞|| ∈ 1±𝜖 ≥1−𝛿

Dimension reduction in other norms/distances?
E.g., ℓ 1 ? Essentially no [CS’02, BC’03, LN’04, JN’10…] For 𝑁 points, 𝐷 approximation: dimension between 𝑁 Ω 1/ 𝐷 and 𝑂(𝑁/𝐷) [BC03, NR10, ANN10…] even if map depends on the dataset! In contrast to ℓ 2 : [JL] gives 𝑂( 𝜖 −2 log 𝑁 ), and doesn’t depend on the dataset Generalize the notion of dimension reduction!

Computational view 𝑆:𝑴 ℝ 𝑘 0,1 𝑘 0,1 𝑘 × 0,1 𝑘 ℝ
𝑥 𝑦 𝑆:𝑴 ℝ 𝑘 Arbitrary computation 𝐶:ℝ𝑘×ℝ𝑘 ℝ + Cons: Less geometric structure (e.g., 𝐶 not metric) Pros: More expressability: better trade-off approximation vs “dimension” 𝑘 Sketch 𝑆: “functional compression scheme” for estimating distances almost all lossy (1+𝜖 distortion or more) and randomized 0,1 𝑘 0,1 𝑘 × 0,1 𝑘 ℝ 𝑆 𝑆(𝑥) 𝑆(𝑦) 𝒅 𝑴 (𝑥,𝑦)≈ 𝑖=1 𝑘 ( 𝐹 𝑖 (𝑥)− 𝐹 𝑖 (𝑦)) 2 𝐶(𝑆 𝑥 , 𝑆 𝑦 )

Sketching for ℓ 1 Analog of Euclidean projections ?
For ℓ 2 , we used: Gaussian distribution has stability property: 𝑔 1 𝑧 1 + 𝑔 2 𝑧 2 +… 𝑔 𝑑 𝑧 𝑑 is distributed as 𝑔⋅||𝑧|| Is there something similar for 1-norm? Yes: Cauchy distribution! 1-stable: 𝑐 1 𝑧 1 + 𝑐 2 𝑧 2 +… 𝑐 𝑑 𝑧 𝑑 is distributed as 𝑐⋅||𝑧| | 1 What’s wrong then? Cauchy are heavy-tailed… doesn’t even have finite expectation (of abs) 𝑝𝑑𝑓 𝑠 = 1 𝜋( 𝑠 2 +1)

Sketching for ℓ 1 [Indyk’00]
Still, can consider similar random map 𝑆 𝑥 =𝑪𝑥= 𝐶 1 𝑥, 𝐶 2 𝑥, …, 𝐶 𝑘 𝑥 Consider 𝑆 𝑥 −𝑆 𝑦 =𝑪𝑥−𝑪𝑦=𝑪 𝑥−𝑦 =𝑪𝑧 where 𝑧=𝑥−𝑦 each coordinate distributed as ||𝑧| | 1 ×Cauchy Take 1-norm: ||𝑪𝑧| | 1 ? does not have finite expectation, but… Can estimate ||𝑧| | 1 by: median 𝐶 1 𝑧 , 𝐶 2 𝑧 ,… 𝐶 𝑘 𝑧 Correctness claim: for each 𝑖 Pr 𝐶 𝑖 𝑧 >||𝑧| | 1 ⋅(1−𝜖) >1/2+Ω(𝜖) Pr 𝐶 𝑖 𝑧 <||𝑧| | 1 ⋅(1+𝜖) >1/2+Ω(𝜖)

Estimator for ℓ 1 Estimator: median 𝐶 1 𝑧 , 𝐶 2 𝑧 ,… 𝐶 𝑘 𝑧
Correctness claim: for each 𝑖 Pr 𝐶 𝑖 𝑧 >||𝑧| | 1 ⋅(1−𝜖) >1/2+Ω(𝜖) Pr 𝐶 𝑖 𝑧 <||𝑧| | 1 ⋅(1+𝜖) >1/2+Ω(𝜖) Proof: 𝐶 𝑖 𝑧 is distributed as abs ||𝑧| | 1 𝑐 =||𝑧| | 1 ⋅|𝑐| Hence claim equivalent to Pr 𝑐 > 1−𝜖 >1/2+Ω 𝜖 Pr 𝑐 < 1+𝜖 >1/2+Ω 𝜖 Matter of checking the pdf of the Cauchy vars…

Estimator for ℓ 1 : high probability bnd
Estimator: median 𝐶 1 𝑧 , 𝐶 2 𝑧 ,… 𝐶 𝑘 𝑧 Claim: for each 𝑖 Pr 𝐶 𝑖 𝑧 >||𝑧| | 1 ⋅(1−𝜖) >1/2+Ω(𝜖) Pr 𝐶 𝑖 𝑧 <||𝑧| | 1 ⋅(1+𝜖) >1/2+Ω(𝜖) Take 𝑘=𝑂 1 𝜖 2 log 1 𝛿 𝐸 𝑖 𝐿 𝑖 ≥𝑘(1/2+Ω 𝜖 ) Hence Pr 𝑖 𝐿 𝑖 ≤ 𝑘 2 <𝛿 (CLT: Chernoff bound) Similarly with 𝑈 𝑖 The above means that median 𝐶 1 𝑧 , 𝐶 2 𝑧 ,… 𝐶 𝑘 𝑧 ∈ 1±𝜖 ||𝑧| | 1 with probability at most 𝛿 𝐿 𝑖 =1 if holds 𝑈 𝑖 =1 if holds

Yesterday’s Application: ℓ 1 regression
Problem: 𝑥 ∗ =𝑎𝑟𝑔𝑚𝑖 𝑛 𝑥 ||𝐴𝑥−𝑏| | 1 +structured 𝑆, +preconditioner: 𝑂 𝑛𝑛𝑧 𝐴 ⋅ log 𝑛 + 𝑑 𝜖 𝑂 1 More: other norms ( ℓ 𝑝 , M-estimator, Orlicz norms), low-rank approximation & optimization, matrix multiplication, see [Woodruff, FnTTCS’14,…] Weak DR: linear map 𝑆: ℝ 𝑛 → ℝ 𝑘 , s.t. for any 𝑝∈ ℝ 𝑛 : Pr 𝑆 1≤ ||𝑆 𝑝 | | 1 ||𝑝| | 1 ≤ 1 𝛿 ≥1−𝑂(𝛿) 𝑆 𝑖𝑗 ∼ Cauchy distribution [I’00] Weak(er) OSE: linear map 𝑆: ℝ 𝑛 → ℝ 𝑘 s.t. for any linear subspace 𝑃⊂ ℝ 𝑛 of dimension 𝑑: Pr 𝑆 ∀𝑝∈𝑃 :1≤ ||𝑆 𝑝 | | 1 ||𝑝| | 1 ≤ 𝑑 𝑂 1 ≥0.9 [SW’11, MM’13, WZ’13, WW’18] 𝑘=𝑂(𝑑⋅log 𝑑)

Today Application: Streaming 1
Challenge: log statistics of the data, using small space IP Frequency 3 2 9 8 1 IP Frequency 3 2

Streaming statistics Let 𝑥𝑖 = frequency of IP 𝑖
1st moment (sum): ∑ 𝑥 𝑖 Trivial: keep a total counter 2nd moment (variance): ∑ 𝑥 𝑖 2 =||𝑥| | 2 Trivially: 𝑛 counters → too much space Can’t do better if exact Small space via (approximate) dimension reduction in ℓ 2 IP Frequency 3 2 ∑ 𝑥 𝑖 =7 ∑ 𝑥 𝑖 2 =17

2nd frequency moment via DR
𝑛 𝑥𝑖 = frequency of IP 𝑖 2nd moment: ∑ 𝑥 𝑖 2 =||𝑥| | 2 Store 𝑆 𝑥 = 𝐺 1 𝑥, 𝐺 2 𝑥,… 𝐺 𝑘 𝑥 =𝑮𝑥 Estimator: 1 𝑘 ||𝑮𝑥| | 2 = 1 𝑘 𝐺 1 𝑥 𝐺 2 𝑥 2 +…+ 𝐺 𝑘 𝑥 2 Updating the sketch: Use linearity of the sketching function: 𝑮 𝑥+ 𝑒 𝑖 =𝑮𝑥+𝑮 𝑒 𝑖 Correctness from dimension reduction guarantee 𝑮 𝑥 𝑘

Streaming Scenario 2 𝑥 𝑦 Question : difference in traffic
𝑥 IP Frequency 1 IP Frequency 1 2 𝑦 Question : difference in traffic ∑ | 𝑥 𝑖 – 𝑦 𝑖 |= 𝑥−𝑦 1 𝑥−𝑦 1 =2 Similar Qs: average delay/variance in a network differential statistics between logs at different servers, etc

Sketching for Difference
Use ℓ 1 sketching! Using random 𝑆(𝑥)=𝑪𝑥 (common for 2 routers) Estimator: median |𝑆 𝑥 −𝑆 𝑦 | Already proved: can get 1+𝜖 approximation with 𝑘=𝑂 1 𝜖 2 IP Frequency 1 2 IP Frequency 1 𝑦 𝑥 𝑆 𝑆 𝑪𝑥 010110 𝑪𝑦 010101 Estimate ||𝑥−𝑦| | 1

Sketching for ℓ 𝑝 norms 𝑝-moment: Σ 𝑥 𝑖 𝑝 = 𝑥 𝑝 𝑝 𝑝≤2 𝑝>2
About 𝑂 1 𝜖 2 counters enough works via 𝑝-stable distributions [Indyk’00] 𝑝>2 Can do (and need) 𝑂 𝜖 ( 𝑛 1−2/𝑝 ) counters [AMS’96, SS’02, BYJKS’02, CKS’03, IW’05, BGKS’06, BO10, AKO’11, G’11, BKSV’14,…]

Streaming 3: # distinct elements
Problem: compute the number of distinct elements in the stream Trivial solution: 𝑂(𝑑) space for 𝑑 distinct elements Will see: 𝑂(log⁡𝑑) space (approximate) IP Frequency 1 2 𝑥 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Distinct Elements 1 Algorithm:
11/13/2018 6:52 AM Initialize: minHash=1 hash function h into [0,1] Process(int i): if (h(i) < minHash) minHash = h(index); Output: 1/minHash-1 Distinct Elements [Flajolet-Martin’85, Alon-Matias-Szegedy’96] Algorithm: Hash function ℎ:𝑈→ 0,1 Compute 𝑚𝑖𝑛𝐻𝑎𝑠ℎ= min 𝑖∈𝑆 ℎ(𝑖) Output is 1 𝑚𝑖𝑛𝐻𝑎𝑠ℎ −1 Main claim: 𝐸 𝑚𝑖𝑛𝐻𝑎𝑠ℎ = 1 𝑑+1 , for 𝑑 distinct elements Proof: repeats of the same element don’t matter 𝑧 = minimum of 𝑑 random numbers in [0,1] Pick another random number 𝑎∈[0,1] What’s the probability 𝑎<𝑧 ? 1) exactly 𝑧 2) probability it is smallest among 𝑑+1 reals: 1 𝑑+1 Take majority of 𝑘=𝑂 1 𝜖 2 log 1 𝛿 repetitions 5 7 2 1 ℎ(5) ℎ(7) ℎ(2) 1/(𝑑+1) © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Sublinear Algorithmic Tools 2

Similar presentations

Presentation on theme: "Sublinear Algorithmic Tools 2"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sublinear Algorithmic Tools 2

Similar presentations

Presentation on theme: "Sublinear Algorithmic Tools 2"— Presentation transcript:

Similar presentations

About project

Feedback