Download presentation
Presentation is loading. Please wait.
Published byApril Florence Bond Modified over 9 years ago
1
Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)
2
Goal Compute the number of Dacians in the empire Estimate S=a 1 +a 2 +…a n where a i [0,1] sublinearly…
3
Sampling Send accountants to a subset J of provinces Estimator: S ̃ =∑ j J a j * n/J Chebyshev bound: with 90% success probability 0.5*S – O(n/m) < S ̃ < 2*S + O(n/m) For constant additive error, need m~ n
4
Send accountants to each province, but require only approximate counts Estimate a ̃ i, up to some pre-selected precision u i : |a i – a ̃ i | < u i Challenge: achieve good trade-off between quality of approximation to S total cost of estimating each a ̃ i to precision u i Precision Sampling Framework
5
Formalization Sum EstimatorAdversary 1. fix a 1,a 2,…a n 1. fix precisions u i 2. fix ã 1,ã 2,…ã n s.t. |a i – ã i | < u i 3. given ã 1,ã 2,…ã n, output S̃ s.t. |∑a i – S̃| < 1. What is cost? Here, average cost = 1/n * ∑ 1/u i to achieve precision u i, use 1/u i “resources”: e.g., if a i is itself a sum a i =∑ j a ij computed by subsampling, then one needs Θ( 1/u i ) samples For example, can choose all u i =1/n Average cost ≈ n This is best possible, if estimator S ̃ = ∑a ̃ i
6
Precision Sampling Lemma Goal: estimate ∑a i from {a ̃ i } satisfying |a i -a ̃ i |<u i. Precision Sampling Lemma: can get, with 90% success: O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1) with average cost equal to O(log n) Example: distinguish Σ a i =5 vs Σ a i =0 Consider two extreme cases: if five a i =1: sample all, but need only crude approx (u i =1/10) if all a i =5/n: only few with good approx u i =1/n, and the rest with u i =1 ε1+ε S – ε < S̃ < (1+ ε)S + ε O(ε -3 log n)
7
Precision Sampling Algorithm Precision Sampling Lemma: can get, with 90% success: O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1) with average cost equal to O(log n) Algorithm: Choose each u i [0,1] i.i.d. Estimator: S ̃ = count number of i‘s s.t. a ̃ i / u i > 6 (modulo a normalization constant) Proof of correctness: we use only a ̃ i which are (1+ ε )-approximation to a i E[ S ̃ ] ≈ ∑ Pr[a i / u i > 6] = ∑ a i /6. E[1/u] = O(log n) w.h.p. function of [ã i /u i - 4/ε] + and u i ’s concrete distrib. = minimum of O(ε -3 ) u.r.v. O(ε -3 log n) ε1+ε S – ε < S̃ < (1+ ε)S + ε
8
Why? Save time: Problem: computing edit distance between two strings new algorithm that obtains (log n) 1/ ε approximation in n 1+O( ε ) time via efficient property-testing algorithm that uses Precision Sampling More details: see the talk by Robi on Friday! Save space: Problem: compute norms/frequency moments in streams gives a simple and unified approach to compute all l p, F k moments, and other goodies More details: now
9
Streaming frequencies Setup: 1+ ε estimate frequencies in small space Let x i = frequency of ethnicity i k th moment: Σ x i k k [0,2]: space O(1/ ε 2 ) [AMS’96,I’00, GC07, Li08, NW10, KNW10, KNPW11] k>2: space O ̃ (n 1-2/k ) [AMS’96,SS’02,BYJKS’02,CKS’03,IW’05,BGKS’06,BO10] Sometimes frequencies x i are negative: If measuring traffic difference (delay, etc) We want linear “dim reduction” L:R n R m m<<n EthnicityFrequency Dacians358 Galois12 Barbarians2988
10
Norm Estimation via Precision Sampling Idea: Use PSL to compute the sum ||x|| k k =∑ |x i | k General approach 1. Pick u i ’s according to PSL and let y i =x i /u i 1/k 2. Compute all y i k up to additive approximation O(1) Can be done by computing the heavy hitters of the vector y 3. Use PSL to compute the sum ||x|| k k =∑ |x i | k Space bound is controlled by the norm ||y|| 2 Since heavy hitters under l 2 is the best we can do Note that ||y|| 2 ≤||x|| 2 * E[1/u i ]
11
Streaming F k moments Theorem: linear sketch for F k with O(1) approximation, O(1) update, and O(n 1-2/k log n) space (in words). Sketch: Pick random u i [0,1], s i {±1}, and let y i = s i * x i / u i 1/k throw into one hash table H, size m=O(n 1-2/k log n) cells Update: on (i, a) H[h(i)] += s i *a/u i 1/k Estimator: Max j [m] |H[j]| k Randomness: O(1) independence suffices x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 y1+y3y1+y3 y4y4 y2+y5+y6y2+y5+y6 x= H=
12
More Streaming Algorithms Other streaming algorithms: Algorithm for all k-moments, including k≤2 For k>2, improves existing space bounds [AMS96, IW05, BGKS06, BO10] For k≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10, KNW10, KNPW11] Improved algorithm for mixed norms ( l p of l k ) [CM05, GBD08, JW09] space bounded by (Rademacher) p-type constant Algorithm for l p -sampling problem [MW’10] This work extended to give tight bounds by [JST’11] Connections: Inspired by the streaming algorithm of [IW05], but simpler Turns out to be distant relative of Priority Sampling [DLT’07]
13
Finale Other applications for Precision Sampling framework ? Better algorithms for precision sampling ? Best bound for average cost (for 1+ ε approximation) Upper bound: O(1/ ε 3 * log n) (tight for our algorithm) Lower bound: Ω (1/ ε 2 * log n) Bounds for other cost models? E.g., for 1/square root of precision, the bound is O(1 / ε 3/2 ) Other forms of “access” to a i ’s ?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.