Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sketching and Streaming Entropy via Approximation Theory Nick Harvey (MSR/Waterloo) Jelani Nelson (MIT) Krzysztof Onak (MIT)

Similar presentations


Presentation on theme: "Sketching and Streaming Entropy via Approximation Theory Nick Harvey (MSR/Waterloo) Jelani Nelson (MIT) Krzysztof Onak (MIT)"— Presentation transcript:

1 Sketching and Streaming Entropy via Approximation Theory Nick Harvey (MSR/Waterloo) Jelani Nelson (MIT) Krzysztof Onak (MIT)

2 Streaming Model Increment x 1 x = (0, 0, 0, 0, …, 0) Algorithm x = (1, 0, 0, 0, …, 0) Algorithm x ∈ ℤ n m updates x = (1, 0, 0, 1, …, 0) Algorithm Increment x 4 Goal: Compute statistics, e.g. || x || 1, || x || 2 … Trivial solution: Store x (or store all updates) O(n·log(m)) space Goal: Compute using O(polylog(nm)) space x = (9, 2, 0, 5, …,12) Algorithm

3 Streaming Algorithms (a very brief introduction) Fact: [Alon-Matias-Szegedy ’99], [Bar-Yossef et al. ’02], [Indyk-Woodruff ’05], [Bhuvanagiri et al. ‘06], [Indyk ’06], [Li ’08], [Li ’09] Can compute (1±  ) = (1±  )F p using O(  -2 log c n) bits of space(if 0  p  2) O(  -O(1) n 1-2/p ∙ log O(1) (n)) bits(if 2<p  ) Another Fact: Mostly optimal: [Alon-Matias-Szegedy ‘99], [Bar-Yossef et al. ’02], [Saks-Sun ’02], [Chakrabarti-Khot-Sun ‘03], [Indyk-Woodruff ’03], [Woodruff ’04] – Proofs using communication complexity and information theory

4 Practical Motivation General goal: Dealing with massive data sets – Internet traffic, large databases, … Network monitoring & anomaly detection – Stream consists of internet packets – x i = # packets sent to port i – Under typical conditions, x is very concentrated – Under “port scan attack”, x less concentrated – Can detect by estimating empirical entropy [Lakhina et al. ’05], [Xu et al. ‘05], [Zhao et al. ‘07]

5 Entropy Probability distribution a = (a 1, a 2, …, a n ) Entropy H(a) = -Σ a i lg(a i ) Examples: – a = (1/n, 1/n, …, 1/n) : H(a) = lg(n) – a = (0, …, 0, 1, 0, …, 0) : H(a) = 0 small when concentrated, LARGE when not

6 Streaming Algorithms for Entropy How much space to estimate H(x)? – [Guha-McGregor-Venkatasubramanian ‘06], [Chakrabarti-Do Ba-Muthu ‘06], [Bhuvanagiri-Ganguly ‘06] – [Chakrabarti-Cormode-McGregor ‘07]: multiplicative (1±  ) approx : O(  -2 log 2 m) bits additive  approx: O(  -2 log 4 m) bits Ω(  -2 ) lower bound for both Our contributions: – Additive  or multiplicative (1±  ) approximation –Õ (  -2 log 3 m) bits, and can handle deletions – Can sketch entropy in the same space ~

7 First Idea If you can estimate F p for p≈1, then you can estimate H(x) Why? Rényi entropy

8 Review of Rényi Definition: Convergence to Shannon: H p (x) p 102… Alfred RényiClaude Shannon

9 Overview of Algorithm Set p=1.01 and let x = Compute Set So ~ ~ ~ ~ ~ (using Li’s “compressed counting”) As p  1 this gets better this gets worse! Analysis

10 Making the tradeoff How quickly does H p (x) converge to H(x)? Theorem: Let x be distr., with min i x i ≥ 1/m. Let. Then Plugging in: O(  -3 log 4 m) bits of space suffice for additive  approximation Multiplicative Approximation Additive Approximation ~ ~ ~ ~~ ~

11 Proof: A trick worth remembering Let f : ℝ  ℝ and g : ℝ  ℝ be such that l’Hopital’s rule says that It actually says more! It says converges to at least as fast as does.

12 Improvements Status: additive  approx using O(  -3 log 4 m) bits How to reduce space further? – Interpolate with multiple points: H p 1 (x), H p 2 (x),... H p (x) p 102… Shannon Multiple Rényis Single Rényi LEGEND

13 Analyzing Interpolation Let f(z) be a C k+1 function Interpolate f with polynomial q with q(z i )=f(z i ), 0≤i≤k Fact: where y, z i [a,b] Our case: Set f(z) = H 1+z (x) Goal: Analyze f (k+1) (z) H p (x) p 102…

14 Bounding Derivatives Rényi derivatives are messy to analyze Switch to Tsallis entropy f(z) = S 1+z (x), Can prove Tsallis also converges to Shannon ~ ~ ~ Define: (when a=-O(1/(k·log m)), b=0) can set k = log(1/ε)+loglog m Fact:

15 Key Ingredient: Noisy Interpolation We don’t have f(z i ), we have f(z i )±ε How to interpolate in presence of noise? Idea: we pick our z i very carefully

16 Chebyshev Polynomials Rogosinski’s Theorem: q(x) of degree k and |q(β j )|≤ 1 (0≤j≤k) |q(x)| ≤ |T k (x)| for |x| > 1 Map [-1,1] onto interpolation interval [z 0,z k ] Choose z j to be image of β j, j=0,…,k Let q(z) interpolate f(z j )±ε and q(z) interpolate f(z j ) r(z) = (q(z)-q(z))/ ε satisfies Rogosinski’s conditions! ~ ~

17 Tradeoff in Choosing z k z k close to 0 |T k (preimage(0))|still small …but z k close to 0 high space complexity Just how close do we need 0 and z k to be? T k grows quickly once leaving [z 0, z k ] z0z0 zkzk 0

18 The Magic of Chebyshev [Paturi ’92] :T k (1 + 1/k c ) ≤ e 4k 1-(c/2). Set c = 2. Suffices to set z k =-O(1/(k 3 log m)) Translates to Õ(  -2 log 3 m) space

19 The Final Algorithm (additive approximation) Set k = lg(1/  ) + lglg(m), z j = (k 2 cos(jπ/k)-(k 2 +1))/(9k 3 lg(m)) (0 ≤ j ≤ k) Estimate S 1+z j = (1-(F 1+z j /(F 1 ) 1+z j ))/z j for 0 ≤ j ≤ k Interpolate degree-k polynomial q(z j ) = S 1+z j Output q(0) ~ ~ ~ ~ ~

20 Multiplicative Approximation How to get multiplicative approximation? – Additive approximation is multiplicative, unless H(x) is small – H(x) small large [CCM ’07] Suppose and define We combine (1±ε)RF 1 and (1±ε)RF 1+z j to get (1±ε)f(z j ) Question: How do we get (1±ε)RF p ? Two different approaches: – A general approach (for any p, and negative frequencies) – An approach exploiting p ≈ 1, only for nonnegative freqs (better by log(m))

21 Questions / Thoughts For what other problems can we use this “generalize-then-interpolate” strategy? – Some non-streaming problems too? The power of moments? The power of residual moments? CountMin (CM ’05) + CountSketch (CCF ’02)  HSS (Ganguly et al.) WANTED : Faster moment estimation (some progress in [Cormode-Ganguly ’07])


Download ppt "Sketching and Streaming Entropy via Approximation Theory Nick Harvey (MSR/Waterloo) Jelani Nelson (MIT) Krzysztof Onak (MIT)"

Similar presentations


Ads by Google