Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sublinear Algorithmic Tools 2

Similar presentations


Presentation on theme: "Sublinear Algorithmic Tools 2"โ€” Presentation transcript:

1 Sublinear Algorithmic Tools 2
Alex Andoni

2 Plan Dimension reduction Sketching
Application: Numerical Linear Algebra Sketching Application: Streaming Application: Nearest Neighbor Search and moreโ€ฆ Dimension reduction: linear map ๐‘†: โ„ ๐‘› โ†’ โ„ ๐‘˜ s.t: for any points ๐‘,๐‘žโˆˆ โ„ ๐‘› : Pr ๐‘† ||๐‘† ๐‘ โˆ’๐‘†(๐‘ž)|| ||๐‘โˆ’๐‘ž|| โˆˆ 1ยฑ๐œ– โ‰ฅ1โˆ’๐›ฟ

3 Dimension reduction in other norms/distances?
E.g., โ„“ 1 ? Essentially no [CSโ€™02, BCโ€™03, LNโ€™04, JNโ€™10โ€ฆ] For ๐‘ points, ๐ท approximation: dimension between ๐‘ ฮฉ 1/ ๐ท and ๐‘‚(๐‘/๐ท) [BC03, NR10, ANN10โ€ฆ] even if map depends on the dataset! In contrast to โ„“ 2 : [JL] gives ๐‘‚( ๐œ– โˆ’2 log ๐‘ ), and doesnโ€™t depend on the dataset Generalize the notion of dimension reduction!

4 Computational view ๐‘†:๐‘ด๏‚ฎ โ„ ๐‘˜ 0,1 ๐‘˜ 0,1 ๐‘˜ ร— 0,1 ๐‘˜ ๏‚ฎโ„
๐‘ฅ ๐‘ฆ ๐‘†:๐‘ด๏‚ฎ โ„ ๐‘˜ Arbitrary computation ๐ถ:โ„๐‘˜ร—โ„๐‘˜๏‚ฎ โ„ + Cons: Less geometric structure (e.g., ๐ถ not metric) Pros: More expressability: better trade-off approximation vs โ€œdimensionโ€ ๐‘˜ Sketch ๐‘†: โ€œfunctional compression schemeโ€ for estimating distances almost all lossy (1+๐œ– distortion or more) and randomized 0,1 ๐‘˜ 0,1 ๐‘˜ ร— 0,1 ๐‘˜ ๏‚ฎโ„ ๐‘† ๐‘†(๐‘ฅ) ๐‘†(๐‘ฆ) ๐’… ๐‘ด (๐‘ฅ,๐‘ฆ)โ‰ˆ ๐‘–=1 ๐‘˜ ( ๐น ๐‘– (๐‘ฅ)โˆ’ ๐น ๐‘– (๐‘ฆ)) 2 ๐ถ(๐‘† ๐‘ฅ , ๐‘† ๐‘ฆ )

5 Sketching for โ„“ 1 Analog of Euclidean projections ?
For โ„“ 2 , we used: Gaussian distribution has stability property: ๐‘” 1 ๐‘ง 1 + ๐‘” 2 ๐‘ง 2 +โ€ฆ ๐‘” ๐‘‘ ๐‘ง ๐‘‘ is distributed as ๐‘”โ‹…||๐‘ง|| Is there something similar for 1-norm? Yes: Cauchy distribution! 1-stable: ๐‘ 1 ๐‘ง 1 + ๐‘ 2 ๐‘ง 2 +โ€ฆ ๐‘ ๐‘‘ ๐‘ง ๐‘‘ is distributed as ๐‘โ‹…||๐‘ง| | 1 Whatโ€™s wrong then? Cauchy are heavy-tailedโ€ฆ doesnโ€™t even have finite expectation (of abs) ๐‘๐‘‘๐‘“ ๐‘  = 1 ๐œ‹( ๐‘  2 +1)

6 Sketching for โ„“ 1 [Indykโ€™00]
Still, can consider similar random map ๐‘† ๐‘ฅ =๐‘ช๐‘ฅ= ๐ถ 1 ๐‘ฅ, ๐ถ 2 ๐‘ฅ, โ€ฆ, ๐ถ ๐‘˜ ๐‘ฅ Consider ๐‘† ๐‘ฅ โˆ’๐‘† ๐‘ฆ =๐‘ช๐‘ฅโˆ’๐‘ช๐‘ฆ=๐‘ช ๐‘ฅโˆ’๐‘ฆ =๐‘ช๐‘ง where ๐‘ง=๐‘ฅโˆ’๐‘ฆ each coordinate distributed as ||๐‘ง| | 1 ร—Cauchy Take 1-norm: ||๐‘ช๐‘ง| | 1 ? does not have finite expectation, butโ€ฆ Can estimate ||๐‘ง| | 1 by: median ๐ถ 1 ๐‘ง , ๐ถ 2 ๐‘ง ,โ€ฆ ๐ถ ๐‘˜ ๐‘ง Correctness claim: for each ๐‘– Pr ๐ถ ๐‘– ๐‘ง >||๐‘ง| | 1 โ‹…(1โˆ’๐œ–) >1/2+ฮฉ(๐œ–) Pr ๐ถ ๐‘– ๐‘ง <||๐‘ง| | 1 โ‹…(1+๐œ–) >1/2+ฮฉ(๐œ–)

7 Estimator for โ„“ 1 Estimator: median ๐ถ 1 ๐‘ง , ๐ถ 2 ๐‘ง ,โ€ฆ ๐ถ ๐‘˜ ๐‘ง
Correctness claim: for each ๐‘– Pr ๐ถ ๐‘– ๐‘ง >||๐‘ง| | 1 โ‹…(1โˆ’๐œ–) >1/2+ฮฉ(๐œ–) Pr ๐ถ ๐‘– ๐‘ง <||๐‘ง| | 1 โ‹…(1+๐œ–) >1/2+ฮฉ(๐œ–) Proof: ๐ถ ๐‘– ๐‘ง is distributed as abs ||๐‘ง| | 1 ๐‘ =||๐‘ง| | 1 โ‹…|๐‘| Hence claim equivalent to Pr ๐‘ > 1โˆ’๐œ– >1/2+ฮฉ ๐œ– Pr ๐‘ < 1+๐œ– >1/2+ฮฉ ๐œ– Matter of checking the pdf of the Cauchy varsโ€ฆ

8 Estimator for โ„“ 1 : high probability bnd
Estimator: median ๐ถ 1 ๐‘ง , ๐ถ 2 ๐‘ง ,โ€ฆ ๐ถ ๐‘˜ ๐‘ง Claim: for each ๐‘– Pr ๐ถ ๐‘– ๐‘ง >||๐‘ง| | 1 โ‹…(1โˆ’๐œ–) >1/2+ฮฉ(๐œ–) Pr ๐ถ ๐‘– ๐‘ง <||๐‘ง| | 1 โ‹…(1+๐œ–) >1/2+ฮฉ(๐œ–) Take ๐‘˜=๐‘‚ 1 ๐œ– 2 log 1 ๐›ฟ ๐ธ ๐‘– ๐ฟ ๐‘– โ‰ฅ๐‘˜(1/2+ฮฉ ๐œ– ) Hence Pr ๐‘– ๐ฟ ๐‘– โ‰ค ๐‘˜ 2 <๐›ฟ (CLT: Chernoff bound) Similarly with ๐‘ˆ ๐‘– The above means that median ๐ถ 1 ๐‘ง , ๐ถ 2 ๐‘ง ,โ€ฆ ๐ถ ๐‘˜ ๐‘ง โˆˆ 1ยฑ๐œ– ||๐‘ง| | 1 with probability at most ๐›ฟ ๐ฟ ๐‘– =1 if holds ๐‘ˆ ๐‘– =1 if holds

9 Yesterdayโ€™s Application: โ„“ 1 regression
Problem: ๐‘ฅ โˆ— =๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘– ๐‘› ๐‘ฅ ||๐ด๐‘ฅโˆ’๐‘| | 1 +structured ๐‘†, +preconditioner: ๐‘‚ ๐‘›๐‘›๐‘ง ๐ด โ‹… log ๐‘› + ๐‘‘ ๐œ– ๐‘‚ 1 More: other norms ( โ„“ ๐‘ , M-estimator, Orlicz norms), low-rank approximation & optimization, matrix multiplication, see [Woodruff, FnTTCSโ€™14,โ€ฆ] Weak DR: linear map ๐‘†: โ„ ๐‘› โ†’ โ„ ๐‘˜ , s.t. for any ๐‘โˆˆ โ„ ๐‘› : Pr ๐‘† 1โ‰ค ||๐‘† ๐‘ | | 1 ||๐‘| | 1 โ‰ค 1 ๐›ฟ โ‰ฅ1โˆ’๐‘‚(๐›ฟ) ๐‘† ๐‘–๐‘— โˆผ Cauchy distribution [Iโ€™00] Weak(er) OSE: linear map ๐‘†: โ„ ๐‘› โ†’ โ„ ๐‘˜ s.t. for any linear subspace ๐‘ƒโŠ‚ โ„ ๐‘› of dimension ๐‘‘: Pr ๐‘† โˆ€๐‘โˆˆ๐‘ƒ :1โ‰ค ||๐‘† ๐‘ | | 1 ||๐‘| | 1 โ‰ค ๐‘‘ ๐‘‚ 1 โ‰ฅ0.9 [SWโ€™11, MMโ€™13, WZโ€™13, WWโ€™18] ๐‘˜=๐‘‚(๐‘‘โ‹…log ๐‘‘)

10 Today Application: Streaming 1
Challenge: log statistics of the data, using small space IP Frequency 3 2 9 8 1 IP Frequency 3 2

11 Streaming statistics Let ๐‘ฅ๐‘– = frequency of IP ๐‘–
1st moment (sum): โˆ‘ ๐‘ฅ ๐‘– Trivial: keep a total counter 2nd moment (variance): โˆ‘ ๐‘ฅ ๐‘– 2 =||๐‘ฅ| | 2 Trivially: ๐‘› counters โ†’ too much space Canโ€™t do better if exact Small space via (approximate) dimension reduction in โ„“ 2 IP Frequency 3 2 โˆ‘ ๐‘ฅ ๐‘– =7 โˆ‘ ๐‘ฅ ๐‘– 2 =17

12 2nd frequency moment via DR
๐‘› ๐‘ฅ๐‘– = frequency of IP ๐‘– 2nd moment: โˆ‘ ๐‘ฅ ๐‘– 2 =||๐‘ฅ| | 2 Store ๐‘† ๐‘ฅ = ๐บ 1 ๐‘ฅ, ๐บ 2 ๐‘ฅ,โ€ฆ ๐บ ๐‘˜ ๐‘ฅ =๐‘ฎ๐‘ฅ Estimator: 1 ๐‘˜ ||๐‘ฎ๐‘ฅ| | 2 = 1 ๐‘˜ ๐บ 1 ๐‘ฅ ๐บ 2 ๐‘ฅ 2 +โ€ฆ+ ๐บ ๐‘˜ ๐‘ฅ 2 Updating the sketch: Use linearity of the sketching function: ๐‘ฎ ๐‘ฅ+ ๐‘’ ๐‘– =๐‘ฎ๐‘ฅ+๐‘ฎ ๐‘’ ๐‘– Correctness from dimension reduction guarantee ๐‘ฎ ๐‘ฅ ๐‘˜

13 Streaming Scenario 2 ๐‘ฅ ๐‘ฆ Question : difference in traffic
๐‘ฅ IP Frequency 1 IP Frequency 1 2 ๐‘ฆ Question : difference in traffic โˆ‘ | ๐‘ฅ ๐‘– โ€“ ๐‘ฆ ๐‘– |= ๐‘ฅโˆ’๐‘ฆ 1 ๐‘ฅโˆ’๐‘ฆ 1 =2 Similar Qs: average delay/variance in a network differential statistics between logs at different servers, etc

14 Sketching for Difference
Use โ„“ 1 sketching! Using random ๐‘†(๐‘ฅ)=๐‘ช๐‘ฅ (common for 2 routers) Estimator: median |๐‘† ๐‘ฅ โˆ’๐‘† ๐‘ฆ | Already proved: can get 1+๐œ– approximation with ๐‘˜=๐‘‚ 1 ๐œ– 2 IP Frequency 1 2 IP Frequency 1 ๐‘ฆ ๐‘ฅ ๐‘† ๐‘† ๐‘ช๐‘ฅ 010110 ๐‘ช๐‘ฆ 010101 Estimate ||๐‘ฅโˆ’๐‘ฆ| | 1

15 Sketching for โ„“ ๐‘ norms ๐‘-moment: ฮฃ ๐‘ฅ ๐‘– ๐‘ = ๐‘ฅ ๐‘ ๐‘ ๐‘โ‰ค2 ๐‘>2
About ๐‘‚ 1 ๐œ– 2 counters enough works via ๐‘-stable distributions [Indykโ€™00] ๐‘>2 Can do (and need) ๐‘‚ ๐œ– ( ๐‘› 1โˆ’2/๐‘ ) counters [AMSโ€™96, SSโ€™02, BYJKSโ€™02, CKSโ€™03, IWโ€™05, BGKSโ€™06, BO10, AKOโ€™11, Gโ€™11, BKSVโ€™14,โ€ฆ]

16 Streaming 3: # distinct elements
Problem: compute the number of distinct elements in the stream Trivial solution: ๐‘‚(๐‘‘) space for ๐‘‘ distinct elements Will see: ๐‘‚(logโก๐‘‘) space (approximate) IP Frequency 1 2 ๐‘ฅ ยฉ 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

17 Distinct Elements 1 Algorithm:
11/13/2018 6:52 AM Initialize: minHash=1 hash function h into [0,1] Process(int i): if (h(i) < minHash) minHash = h(index); Output: 1/minHash-1 Distinct Elements [Flajolet-Martinโ€™85, Alon-Matias-Szegedyโ€™96] Algorithm: Hash function โ„Ž:๐‘ˆโ†’ 0,1 Compute ๐‘š๐‘–๐‘›๐ป๐‘Ž๐‘ โ„Ž= min ๐‘–โˆˆ๐‘† โ„Ž(๐‘–) Output is 1 ๐‘š๐‘–๐‘›๐ป๐‘Ž๐‘ โ„Ž โˆ’1 Main claim: ๐ธ ๐‘š๐‘–๐‘›๐ป๐‘Ž๐‘ โ„Ž = 1 ๐‘‘+1 , for ๐‘‘ distinct elements Proof: repeats of the same element donโ€™t matter ๐‘ง = minimum of ๐‘‘ random numbers in [0,1] Pick another random number ๐‘Žโˆˆ[0,1] Whatโ€™s the probability ๐‘Ž<๐‘ง ? 1) exactly ๐‘ง 2) probability it is smallest among ๐‘‘+1 reals: 1 ๐‘‘+1 Take majority of ๐‘˜=๐‘‚ 1 ๐œ– 2 log 1 ๐›ฟ repetitions 5 7 2 1 โ„Ž(5) โ„Ž(7) โ„Ž(2) ๏‚ป1/(๐‘‘+1) ยฉ 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Sublinear Algorithmic Tools 2"

Similar presentations


Ads by Google