Download presentation
Presentation is loading. Please wait.
Published byChana Pitman Modified over 10 years ago
1
Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak (CMU)
2
Goal Compute the fraction of Dacians in the empire Estimate S=a 1 +a 2 +…a n where a i [0,1]
3
Sampling Send accountants to a subset J of provinces, |J|=m Estimator: S ̃ =∑ j J a j * n / m Chebyshev bound: with 90% success probability 0.5*S – O(n/m) < S ̃ < 2*S + O(n/m) For constant additive error, need m~ n
4
Send accountants to each province, but require only approximate counts Estimate a i up to pre-selected precision u i i.e. |a i – a ̃ i |<u i Challenge: achieve good tradeoff between quality of approximation to S total cost of computing each a ̃ i (within precision u i ) Precision Sampling Framework
5
Formalization Estimator (Alg)Adversary 1. fix (hidden) a 1,a 2,…a n 1. fix precisions u i 2. fix ã 1,ã 2,…ã n s.t. |a i –ã i |<u i 3. report S̃ s.t. |∑ i a i –S̃| < 1 What is our cost model? Here, average cost = 1/n * ∑ i 1/u i Achieving precision u i requires 1/u i “resources”: e.g., if a i is itself a sum a i =∑ j a ij computed by subsampling, then one needs Θ( 1/u i ) samples For example, can choose all u i =1/n Average cost ≈ n This is best possible, if estimator S ̃ = ∑ i a ̃ i
6
Precision Sampling Lemma Goal: estimate ∑a i from {a ̃ i } satisfying |a i -a ̃ i |<u i. Precision Sampling Lemma: can get, with 90% success: O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1) with average cost O(log n) Example: distinguish Σ a i =3 vs Σ a i =1 Consider two extreme cases: if three a i =1: estimate all a i with crude approx (u i =0.1) if all a i =3/n: estimate few with good approx u i =1/n, the rest with u i =1 ε1+ε S – ε < S̃ < (1+ ε)S + ε O(ε -3 log n)
7
Precision Sampling Algorithm Precision Sampling Lemma: can get, with 90% success: O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1) with average cost equal to O(log n) Algorithm: Choose each u i [0,1] i.i.d. Estimator: S ̃ = count number of i‘s s.t. a ̃ i /u i > 6 (and normalize) Outline of analysis: E[ S ̃ ] = ∑ i Pr[a ̃ i /u i > 6] = ∑ i Pr[a i > (6±1)u i ] ≈ ∑ a i /6. Actually, a ̃ i may have also 1.5-multiplicative error w.r.t. a i E[1/u i ] = O(log n) w.h.p. (after truncation) function of [ã i /u i - 4/ε] + and u i ’s concrete distrib. = minimum of O(ε -3 ) uniform r.v. O(ε -3 log n) ε1+ε S – ε < S̃ < (1+ ε)S + ε
8
Why? Save time: Problem: computing edit distance between two strings [FOCS’10] new algorithm that obtains (log n) 1/ ε approximation in n 1+O( ε ) time via property-testing-like algorithm using Precision Sampling (recursively) Save space: Problem: compute norms/moments of frequencies in a data- stream [FOCS’11] a simple and unified approach to compute all l p -norms/moments, and related problems
9
Streaming/sketching IPFrequency 131.107.65.143 18.0.1.122 80.97.56.202 131.107.65.14 18.0.1.12 80.97.56.20 IPFrequency 131.107.65.143 18.0.1.122 80.97.56.202 127.0.0.19 192.168.0.18 257.2.5.70 16.09.20.111 Challenge: log statistics of the data, using small space
10
Streaming moments Setup: 1+ ε estimate frequencies in small space Let x i = frequency of IP i p th moment: Σ i x i p p=1: keep one counter! p [0,2]: space O( ε -2 ¢ log n) [AMS’96, I’00, GC’07, Li’08, NW’10, KNW’10, KNPW’11] p>2: space O ̃ ε (n 1-2/p ) [AMS’96, SS’02, BJKS’02, CKS’03, IW’05, BGKS’06, BO’11] Generally, x R n (updates: to coordinate i with ±1) Sketch = embedding into a “space” of small dimension Usually, linear L:R n R m for m ¿ n, thus L(x±e i )=Lx±Le i IPFrequency 131.107.65.143 18.0.1.122 80.97.56.202
11
l p moments Theorem: linear sketch for l p with O(1) approximation, and O(n 1-2/p log n) space (90% succ. prob.). =weak embedding of l p n into l ∞ m of dim m=O(n 1-2/p log n) Sketch: pick random u i [0,1], r i {±1} and let y i = r i ∙x i /u i 1/p throw y i ‘s into hash table H with m=O(n 1-2/p log n) cells Estimator: via PSL or just Max j [m] |H[j]| p Randomness: O(1) independence suffices x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 y1+y3y1+y3 y4y4 y2+y5+y6y2+y5+y6 x= H= 1 … m
12
Under the Hood: Using PSL Idea: Use PSL to compute the sum ||x|| p p =∑ i |x i | p Assume ||x|| 2 =1 by scaling Set PSL additive error ε small compared to ||x|| 2 p /n p/2-1 · ||x|| p p Outline: 1. Pick u i ’s according to PSL and let y i =x i /u i 1/p 2. Compute every y i p =x i p /u i within additive approximation 1 done via heavy hitters of the vector y 3. Use PSL on |y i p u i |=|x i | p to compute the sum ∑ i |x i | p Space bound is controlled by the norm ||y|| 2 2. Since heavy hitters under l 2 is the best we can do Notice E||y|| 2 2 = ||x|| 2 2 ¢ E[1/u 2/p ] · (1/ ε ) 2/p =(n p/2-1 ) 2/p.
13
More Streaming Algorithms Other streaming algorithms: Same algorithm for all p-moments, including p≤2 For p>2, improves existing space bounds [AMS96, IW05, BGKS06, BO10] For p≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10, KNW10, KNPW11] Algorithms for mixed norms ( l p of l q ) [CM05, GBD08, JW09] Space bounded by (Rademacher) p-type constant Algorithm for l p -sampling problem [MW’10] This work extended to give tight bounds by [JST’10] Connections: Inspired by the streaming algorithm of [IW05], but simpler Turns out to be distant relative of Priority Sampling [DLT’07]
14
Finale Other applications for Precision Sampling framework? Better algorithms for precision sampling? For average cost (for 1+ ε approximation) Upper bound: O( ε -3 log n) (tight for our algorithm) Lower bound: Ω ( ε -2 log n) Bounds for other cost models? E.g., for 1/square root of precision, the bound is O( ε -3/2 ) Other forms of “access” to a i ’s?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.