Download presentation
Presentation is loading. Please wait.
Published byVivien Parrish Modified over 8 years ago
1
Information Theory for Data Streams David P. Woodruff IBM Almaden
2
Talk Outline 1.Information Theory Concepts 2.Distances Between Distributions 3.An Example Communication Lower Bound – Randomized 1-way Communication Complexity of the INDEX problem
3
Discrete Distributions
4
Entropy (symmetric)
5
Conditional and Joint Entropy
6
Chain Rule for Entropy
7
Conditioning Cannot Increase Entropy continuous
8
Conditioning Cannot Increase Entropy
9
Mutual Information (Mutual Information) I(X ; Y) = H(X) – H(X | Y) = H(Y) – H(Y | X) = I(Y ; X) Note: I(X ; X) = H(X) – H(X | X) = H(X) (Conditional Mutual Information) I(X ; Y | Z) = H(X | Z) – H(X | Y, Z)
10
Chain Rule for Mutual Information
11
Fano’s Inequality Here X -> Y -> X’ is a Markov Chain, meaning X’ and X are independent given Y. “Past and future are conditionally independent given the present” To prove Fano’s Inequality, we need the data processing inequality
12
Data Processing Inequality
13
Proof of Fano’s Inequality
14
Tightness of Fano’s Inequality
15
Tightness of Fano’s Inequality
16
Talk Outline 1.Information Theory Concepts 2.Distances Between Distributions 3.An Example Communication Lower Bound – Randomized 1-way Communication Complexity of the INDEX problem
17
Distances Between Distributions
18
Why Hellinger Distance?
19
Product Property of Hellinger Distance
20
Jensen-Shannon Distance l
21
Relations Between Distance Measures ½ + δ/2
22
Talk Outline 1.Information Theory Concepts 2.Distances Between Distributions 3.An Example Communication Lower Bound – Randomized 1-way Communication Complexity of the INDEX problem
23
Randomized 1-Way Communication Complexity x 2 {0,1} n j 2 {1, 2, 3, …, n} INDEX PROBLEM
24
1-Way Communication Complexity of Index
25
1-Way Communication of Index Continued
26
Typical Communication Reduction a 2 {0,1} n Create stream s(a) b 2 {0,1} n Create stream s(b) Lower Bound Technique 1. Run Streaming Alg on s(a), transmit state of Alg(s(a)) to Bob 2. Bob computes Alg(s(a), s(b)) 3. If Bob solves g(a,b), space complexity of Alg at least the 1- way communication complexity of g
27
Example: Distinct Elements Give a 1, …, a m in [n], how many distinct numbers are there? Index problem: Alice has a bit string x in {0, 1} n Bob has an index i in [n] Bob wants to know if x i = 1 Reduction: s(a) = i 1, …, i r, where i j appears if and only if x i j = 1 s(b) = i If Alg(s(a), s(b)) = Alg(s(a))+1 then x i = 0, otherwise x i = 1 Space complexity of Alg at least the 1-way communication complexity of Index
28
Strengthening Index: Augmented Indexing Augmented-Index problem: Alice has x 2 {0, 1} n Bob has i 2 [n], and x 1, …, x i-1 Bob wants to learn x i Similar proof shows (n) bound I(M ; X) = sum i I(M ; X i | X < i ) = n – sum i H(X i | M, X < i ) By Fano’s inequality, H(X i | M, X 1- δ from M, X < i CC δ (Augmented-Index) > I(M ; X) ¸ n(1-H(δ))
29
Lower Bounds for Counting with Deletions
30
Gap-Hamming Problem x 2 {0,1} n y 2 {0,1} n Promise: Hamming distance satisfies Δ(x,y) > n/2 + εn or Δ(x,y) < n/2 - εn Lower bound of Ω(ε -2 ) for randomized 1-way communication [Indyk, W], [W], [Jayram, Kumar, Sivakumar] Gives Ω(ε -2 ) bit lower bound for approximating number of distinct elements Same for 2-way communication [Chakrabarti, Regev]
31
Gap-Hamming From Index [JKS] E[Δ(a,b)] = t/2 + x i ¢ t 1/2 x 2 {0,1} t i 2 [t] t = ε -2 Public coin = r 1, …, r t, each in {0,1} t a 2 {0,1} t b 2 {0,1} t a k = Majority j such that x j = 1 r k j b k = r k i
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.