Download presentation
Presentation is loading. Please wait.
1
Sketching and Streaming Entropy via Approximation Theory Nick Harvey (MSR/Waterloo) Jelani Nelson (MIT) Krzysztof Onak (MIT)
2
Streaming Model Increment x 1 x = (0, 0, 0, 0, …, 0) Algorithm x = (1, 0, 0, 0, …, 0) Algorithm x ∈ ℤ n m updates x = (1, 0, 0, 1, …, 0) Algorithm Increment x 4 Goal: Compute statistics, e.g. || x || 1, || x || 2 … Trivial solution: Store x (or store all updates) O(n·log(m)) space Goal: Compute using O(polylog(nm)) space x = (9, 2, 0, 5, …,12) Algorithm
3
Streaming Algorithms (a very brief introduction) Fact: [Alon-Matias-Szegedy ’99], [Bar-Yossef et al. ’02], [Indyk-Woodruff ’05], [Bhuvanagiri et al. ‘06], [Indyk ’06], [Li ’08], [Li ’09] Can compute (1± ) = (1± )F p using O( -2 log c n) bits of space(if 0 p 2) O( -O(1) n 1-2/p ∙ log O(1) (n)) bits(if 2<p ) Another Fact: Mostly optimal: [Alon-Matias-Szegedy ‘99], [Bar-Yossef et al. ’02], [Saks-Sun ’02], [Chakrabarti-Khot-Sun ‘03], [Indyk-Woodruff ’03], [Woodruff ’04] – Proofs using communication complexity and information theory
4
Practical Motivation General goal: Dealing with massive data sets – Internet traffic, large databases, … Network monitoring & anomaly detection – Stream consists of internet packets – x i = # packets sent to port i – Under typical conditions, x is very concentrated – Under “port scan attack”, x less concentrated – Can detect by estimating empirical entropy [Lakhina et al. ’05], [Xu et al. ‘05], [Zhao et al. ‘07]
5
Entropy Probability distribution a = (a 1, a 2, …, a n ) Entropy H(a) = -Σ a i lg(a i ) Examples: – a = (1/n, 1/n, …, 1/n) : H(a) = lg(n) – a = (0, …, 0, 1, 0, …, 0) : H(a) = 0 small when concentrated, LARGE when not
6
Streaming Algorithms for Entropy How much space to estimate H(x)? – [Guha-McGregor-Venkatasubramanian ‘06], [Chakrabarti-Do Ba-Muthu ‘06], [Bhuvanagiri-Ganguly ‘06] – [Chakrabarti-Cormode-McGregor ‘07]: multiplicative (1± ) approx : O( -2 log 2 m) bits additive approx: O( -2 log 4 m) bits Ω( -2 ) lower bound for both Our contributions: – Additive or multiplicative (1± ) approximation –Õ ( -2 log 3 m) bits, and can handle deletions – Can sketch entropy in the same space ~
7
First Idea If you can estimate F p for p≈1, then you can estimate H(x) Why? Rényi entropy
8
Review of Rényi Definition: Convergence to Shannon: H p (x) p 102… Alfred RényiClaude Shannon
9
Overview of Algorithm Set p=1.01 and let x = Compute Set So ~ ~ ~ ~ ~ (using Li’s “compressed counting”) As p 1 this gets better this gets worse! Analysis
10
Making the tradeoff How quickly does H p (x) converge to H(x)? Theorem: Let x be distr., with min i x i ≥ 1/m. Let. Then Plugging in: O( -3 log 4 m) bits of space suffice for additive approximation Multiplicative Approximation Additive Approximation ~ ~ ~ ~~ ~
11
Proof: A trick worth remembering Let f : ℝ ℝ and g : ℝ ℝ be such that l’Hopital’s rule says that It actually says more! It says converges to at least as fast as does.
12
Improvements Status: additive approx using O( -3 log 4 m) bits How to reduce space further? – Interpolate with multiple points: H p 1 (x), H p 2 (x),... H p (x) p 102… Shannon Multiple Rényis Single Rényi LEGEND
13
Analyzing Interpolation Let f(z) be a C k+1 function Interpolate f with polynomial q with q(z i )=f(z i ), 0≤i≤k Fact: where y, z i [a,b] Our case: Set f(z) = H 1+z (x) Goal: Analyze f (k+1) (z) H p (x) p 102…
14
Bounding Derivatives Rényi derivatives are messy to analyze Switch to Tsallis entropy f(z) = S 1+z (x), Can prove Tsallis also converges to Shannon ~ ~ ~ Define: (when a=-O(1/(k·log m)), b=0) can set k = log(1/ε)+loglog m Fact:
15
Key Ingredient: Noisy Interpolation We don’t have f(z i ), we have f(z i )±ε How to interpolate in presence of noise? Idea: we pick our z i very carefully
16
Chebyshev Polynomials Rogosinski’s Theorem: q(x) of degree k and |q(β j )|≤ 1 (0≤j≤k) |q(x)| ≤ |T k (x)| for |x| > 1 Map [-1,1] onto interpolation interval [z 0,z k ] Choose z j to be image of β j, j=0,…,k Let q(z) interpolate f(z j )±ε and q(z) interpolate f(z j ) r(z) = (q(z)-q(z))/ ε satisfies Rogosinski’s conditions! ~ ~
17
Tradeoff in Choosing z k z k close to 0 |T k (preimage(0))|still small …but z k close to 0 high space complexity Just how close do we need 0 and z k to be? T k grows quickly once leaving [z 0, z k ] z0z0 zkzk 0
18
The Magic of Chebyshev [Paturi ’92] :T k (1 + 1/k c ) ≤ e 4k 1-(c/2). Set c = 2. Suffices to set z k =-O(1/(k 3 log m)) Translates to Õ( -2 log 3 m) space
19
The Final Algorithm (additive approximation) Set k = lg(1/ ) + lglg(m), z j = (k 2 cos(jπ/k)-(k 2 +1))/(9k 3 lg(m)) (0 ≤ j ≤ k) Estimate S 1+z j = (1-(F 1+z j /(F 1 ) 1+z j ))/z j for 0 ≤ j ≤ k Interpolate degree-k polynomial q(z j ) = S 1+z j Output q(0) ~ ~ ~ ~ ~
20
Multiplicative Approximation How to get multiplicative approximation? – Additive approximation is multiplicative, unless H(x) is small – H(x) small large [CCM ’07] Suppose and define We combine (1±ε)RF 1 and (1±ε)RF 1+z j to get (1±ε)f(z j ) Question: How do we get (1±ε)RF p ? Two different approaches: – A general approach (for any p, and negative frequencies) – An approach exploiting p ≈ 1, only for nonnegative freqs (better by log(m))
21
Questions / Thoughts For what other problems can we use this “generalize-then-interpolate” strategy? – Some non-streaming problems too? The power of moments? The power of residual moments? CountMin (CM ’05) + CountSketch (CCF ’02) HSS (Ganguly et al.) WANTED : Faster moment estimation (some progress in [Cormode-Ganguly ’07])
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.