Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joint work with Morteza Monemizadeh

Similar presentations


Presentation on theme: "Joint work with Morteza Monemizadeh"— Presentation transcript:

1 Joint work with Morteza Monemizadeh
Lp-Sampling David Woodruff IBM Almaden Joint work with Morteza Monemizadeh TU Dortmund

2 Output i with probability |xi|p/Fp, where Fp = |x|pp = Σi=1n |xi|p
Given a stream of updates (i, a) to coordinates i of an n-dimensional vector x |a| < poly(n) a is an integer stream length < poly(n) Output i with probability |xi|p/Fp, where Fp = |x|pp = Σi=1n |xi|p Easy cases: p = 1 and updates all of the form (i, 1) for some i Solution: choose a random update in the stream, output the coordinate it updates [Alon, Matias, Szegedy] Generalizes to all positive updates p = 0 and there are no deletions Solution: min-wise hashing, hash all distinct coordinates as you see them, maintain the minimum hash and item [Broder, Charikar, Frieze, Mitzenmacher] [Indyk] [Cormode, Muthukrishnan]

3 Pr[I = j] = (1 ± ε)|xj|p/Fp
Our main result For every 0 · p · 2, there is an algorithm that with probability · n-100 fails, and otherwise outputs an I in [n] for which for all j in [n] Pr[I = j] = (1 ± ε)|xj|p/Fp Condition on every invocation succeeding in any poly(n)-time algorithm Algorithm is 1-pass, poly(ε-1 log n)-space and update time, and also returns wi = (1 ± ε)|xj|p/Fp Generalizes to 1-pass n1-2/ppoly(ε-1 log n)-space for p > 2 “additive-error” samplers Pr[I = j] = |xj|p/Fp ± εFp given explicitly in [Jayram, W] implicitly in [Andoni, DoBa, Indyk, W]

4 Lp-sampling solves and unifies many well-studied streaming problems:

5 Solves Sampling with Deletions:
[Cormode, Muthukrishnan, Rozenbaum] want importance sampling with deletions: maintain a sample i with probability |xi|/|x|1 Set p = 1 in our theorem [Chaudhuri, Motwani, Narasayya] ask to sample from the result of a SQL operation, e.g., self-join Set p = 2 in our theorem [Frahling, Indyk, Sohler] study maintaining approximate range spaces and costs of Euclidean spanning trees They need and obtain a routine to sample a point from a set undergoing insertions and deletions Alternatively, set p = 0 in our theorem

6 Alternative solution to Heavy Hitters Problem for any Fp:
Output all i for which |xi|p > Á Fp Do not output any i for which |xi|p < (Á/2) Fp Studied by Charikar, Chen, Cormode, Farach-Colton, Ganguly, Muthukrishnan, and many others Invoke our algorithm O~(1/Á) times, use approximations to values Optimal up to poly(ε-1 log n) factors

7 Solves Block Heavy Hitters: given an n x d matrix, return indices i of rows Ri with |Ri|pp > Á ¢ Σj |Rj|pp [Andoni, DoBa, Indyk] study the case p = 1 Used by [Andoni, Indyk, Kraughtgamer] for constructing a small-size sketch for the Ulam metric under the edit distance Treat R as a big (nd)-dimensional vector Sample an entry (i, j) using our theorem for general p The probability a row i is sampled is |Ri|pp/ Σj |Rj|pp, so we can recover IDs of all the heavy rows. We do not use Cauchy random variables or Nisan’s pseudorandom generator, could be more practical than [ADI]

8 Alternative Solution to Fk-Estimation for any k ¸ 2:
Optimal up to poly(ε-1 log n) factors Reduction given by [Coppersmith, Kumar]: Take r = O(n1-2/k) L2-samples wi1, … , wir In parallel estimate F2, call it F2’ Output (F2’/r) * Σj wijk-2 Proof: second moment method First algorithm not to use Nisan’s pseudorandom generator

9 Solves Cascaded Moment Estimation:
Given an n x d matrix A, Fk(Fp)(A) = Σj |Aj|pkp Problem initiated by [Cormode, Muthukrishnan] Show F2(F0)(A) uses O(n1/2) space if no deletions Ask about complexity for other k and p For any p in [0,2], gives O(n1-1/k) space for Fk(Fp)(A) We get entry (i, j) with probability |Ai, j|p/ Σi’, j’ |Ai’, j’|p Probability row Ai is returned is Fp(Ai)/ Σj Fp(Aj) If 2 passes allowed, take O(n1-1/k) samples Ai, in 1st pass, compute Fp(Ai) in 2nd pass, and feed into Fk AMS estimator To get 1 pass, feed row IDs into an O(n1-1/k)-space algorithm of [Jayram, W] for estimating Fk based only on item IDs Algorithm is space-optimal [Jayram, W] Our theorem with p = 0 gives O(n1/2) space for F2(F0)(A) with deletions

10 Ok, so how does it work?

11 General Framework [Indyk, W]
1. Form streams by subsampling St = {i | |xi| in [ηt-1, ηt)} for η = 1 + £(ε) St contributes if |St|ηpt ¸ ³ Fp(x), where ³ = poly(ε/log n) assume p > 0 in talk Let h:[n] -> [n] be a hash function Create log n substreams Stream1, Stream2, …, Streamlog n Streamj is stream restricted to updates (i, c) with h(i) · n/2j Suppose 2j ¼ |St|. Then Streamj contains about 1 item of St Fp(Streamj) ¼ Fp(x)/2j |St| ηpt ¸ ³ Fp(x) means ηpt ¸ ³ Fp(Streamj) Can find the item in St in Streamj with Fp-heavy hitters algorithm Repeat the sampling poly(ε-1log n) times, count number of times there was an item in Streamj from St Use this to estimate sizes of contributing St, and Fp(x) ¼ Σt |St|ηpt 2. Run Heavy hitters algorithm on streams 3. Use heavy hitters to estimate contributing St

12 Additive Error Sampler [Jayram, W]
For contributing St, we also get poly(ε-1log n) items from the heavy hitters routine If the sub-sampling is sufficiently random (Nisan’s generator, min-wise independent), these items are random in St Since we have (1 ± ε)-approximations s’t to all contributing St, can: Choose a contributing t with probability s’tηpt/Σt’ s’t’ηpt Output a random heavy hitter found in St For item i in contributing St, Pr[i output] =[s’tηpt/Σt’ s’t’ηpt] ¢ 1/|St| = (1 ± ε)|xi|p/Fp For item i in non-contributing St, Pr[i output] = 0

13 Relative Error in Words
Force all classes to contribute Inject additional coordinates in each class whose purpose is to make every class contribute Inject just enough so that overall, Fp does not change by more than a (1+ε)-factor Run [Jayram, W]-sampling on resulting vector If the item sampled is an injected coordinate, forget about it Repeat many times in parallel and take the first repetition that is not an injected coordinate Since injected coordinates only contribute O(ε) to Fp mass, small # of repetitions suffice

14 Some Minor Points Before seeing the stream, we don’t know which classes contribute, so we inject coordinates into every class For St = {i | |xi| in [ηt-1, ηt)}, inject £(εFp/(ηpt # classes)) coordinates, where # classes = O(ε-1log n) Need to know Fp - just guess it, verify at end of stream For some classes, £(εFp/(ηpt # classes)) < 1, e.g. if t is very large, so we can’t inject any new coordinates Find all elements in these classes and (1 ± ε)-approximations to their frequencies separately using a heavy hitters algorithm When sampling, either choose a heavy hitter with the appropriate probability, or select from contributing sets using [Jayram, W]

15 There is a Problem The [Jayram, W]-sampler fails with probability ¸ poly(ε/log n), in which case it can output any item This is due to some of the subroutines of [Indyk, W] that it relies on, which only succeed with this probability So the large poly(ε/log n) additive error is still there Cannot repeat [Jayram, W] multiple times for amplification, since we get a collection of samples, and no obvious way of detecting failure On the other hand, could just repeat [Indyk, W] and take the median for the simpler Fk-estimation problem Our solution: Dig into the guts of the [Indyk, W] algorithm Amplify success probability to ¸ 1 – n-100 of subroutines

16 A Technical Point About [Indyk, W]
In [Indyk, W], Create log n substreams Streamj, where Streamj includes each coordinate independently with probability 2-j Can find the items in contributing St in Streamj with Fp-heavy hitters Repeat the sampling poly(ε-1log n) times, observe the fraction there is an item in Streamj from St Can use [Indyk, W] to estimate every |St| since every class contributes Issue of misclassification St = {i | |xi| in [ηt-1, ηt)}, and Fp-heavy hitters algorithm only reports approximate frequencies of items i it finds If |xi| = ηt, it may be classified into St or St+1 – it doesn’t matter Simpler solution than in [Indyk, W] If item misclassified, just classify it consistently if we see it again Equivalent to sampling from x’ with |x’|p = (1 ± ε)|x|p Can ensure with probability ¸ 1-n-100, we obtain st’ = (1 ± ε)|St| for all t

17 A Technical Point About [Jayram, W]
Since we have st’ = (1 ± ε)|St| for all t Choose a class t with probability s’tηpt/Σt’ s’t’ηpt Output a random heavy hitter found in St How do we output a random item in St ? Min-wise independent hash function h For each i in St, h(i) = minj in St h(j) with probability (1 ± ε)/|St| h can be an O(log 1/ε)-wise independent hash function We recover i* in St for which h(i*) is minimum Compatible with sub-sampling, where Streamj is items i for which h(i) · n/2j Our goal is to recover i* with probability ¸ 1-n-100 We have st’, and look at the level j* where |St|/2j* = £(log n) If h is O(log n)-wise independent, then with probability ¸ 1-n-100, i* is in Streamj* A worry: maybe Fp(Streamj*) >> Fp(x)/2j* so Heavy Hitter algorithm doesn’t work Can be resolved with enough independent repetitions

18 Beyond the Moraines: Sampling Records
Given an n x d matrix M of rows M1, …, Mn, sample i with probability |Mi|X/Σj |Mj|X, where X is a norm If i sampled, return a vector v for which |v|X = (1 ± ε)|Mi|X Applications Estimating planar EMD [Andoni, DoBa, Indyk, W] Sampling records in a relational database Define classes St = {i | |Mi|X in [ηt-1, ηt)} for η = 1 + £(ε) If we have a heavy hitters algorithm for rows of a matrix, then we can apply a similar approach as before Space should be d¢poly(ε-1log n)

19 Heavy Hitters for Rows Algorithm in [Andoni, DoBa, Indyk, W]
Partition rows into B buckets In each bucket maintain the vector sum of rows hashed to it If |Mi|X > γΣj |Mj|X, and if v is the vector in the bucket containing Mi, by the triangle inequality |v|X < |Mi|X + |Noise|X ¼ |Mi|X + Σj |Mj|X/B |v|X > |Mi|X - |Noise|X ¼ |Mi|X – Σj |Mj|X/B For B large enough, noise translates to a relative error

20 Lower Bounds Pr[I = j] = (1 ± ε)|xj|p/Fp
For every 0 · p · 2, there is a randomized algorithm that with probability · n-100 outputs FAIL, and otherwise outputs an I in [n] for which for all j in [n] Pr[I = j] = (1 ± ε)|xj|p/Fp Algorithm is 1-pass, poly(ε-1 log n)-space and time, returns wi = (1 ± ε)|xj|p/Fp For p > 2, gives n1-2/ppoly(ε-1 log n)-space. Can we use less space for p > 2? Requires (n1-2/p) space for any ε. Reduction from L1-estimation Can improve to (n1-2/plog n) using augmented L1-estimation [Jayram, W] Can we output FAIL with probability 0? Requires (n) space for any ε. Reduction from 2-party equality testing with no error Given that we don’t output FAIL, can we get a sampler with ε = 0? Yes for 2-pass algorithms, using rejection sampling. 1-pass requires (n) space if algorithm outputs the corresponding probability wi (needed in many applications). Reduction from the 2-party INDEX problem

21 Some Open Questions Thank you 1-pass algorithms for Lp-sampling
If we output FAIL with probability · n-100, and don’t require outputting the sampled item’s probability, can we get ε = 0 with low space? ε and log n factors are large. What is the optimal dependence on them? Useful for Fk-estimation for k > 2, and other applications Sampling from other distributions Given a vector (x1, …, xn) in a data stream, for which functions g can we sample from the distribution ¹(i) = |g(xi)|/Σj |g(xj)|? E.g., random walks Thank you


Download ppt "Joint work with Morteza Monemizadeh"

Similar presentations


Ads by Google