Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Theory for Data Streams David P. Woodruff IBM Almaden.

Similar presentations


Presentation on theme: "Information Theory for Data Streams David P. Woodruff IBM Almaden."— Presentation transcript:

1 Information Theory for Data Streams David P. Woodruff IBM Almaden

2 Talk Outline 1.Information Theory Concepts 2.Distances Between Distributions 3.An Example Communication Lower Bound – Randomized 1-way Communication Complexity of the INDEX problem

3 Discrete Distributions

4 Entropy (symmetric)

5 Conditional and Joint Entropy

6 Chain Rule for Entropy

7 Conditioning Cannot Increase Entropy continuous

8 Conditioning Cannot Increase Entropy

9 Mutual Information (Mutual Information) I(X ; Y) = H(X) – H(X | Y) = H(Y) – H(Y | X) = I(Y ; X) Note: I(X ; X) = H(X) – H(X | X) = H(X) (Conditional Mutual Information) I(X ; Y | Z) = H(X | Z) – H(X | Y, Z)

10 Chain Rule for Mutual Information

11 Fano’s Inequality Here X -> Y -> X’ is a Markov Chain, meaning X’ and X are independent given Y. “Past and future are conditionally independent given the present” To prove Fano’s Inequality, we need the data processing inequality

12 Data Processing Inequality

13 Proof of Fano’s Inequality

14 Tightness of Fano’s Inequality

15 Tightness of Fano’s Inequality

16 Talk Outline 1.Information Theory Concepts 2.Distances Between Distributions 3.An Example Communication Lower Bound – Randomized 1-way Communication Complexity of the INDEX problem

17 Distances Between Distributions

18 Why Hellinger Distance?

19 Product Property of Hellinger Distance

20 Jensen-Shannon Distance l

21 Relations Between Distance Measures ½ + δ/2

22 Talk Outline 1.Information Theory Concepts 2.Distances Between Distributions 3.An Example Communication Lower Bound – Randomized 1-way Communication Complexity of the INDEX problem

23 Randomized 1-Way Communication Complexity x 2 {0,1} n j 2 {1, 2, 3, …, n} INDEX PROBLEM

24 1-Way Communication Complexity of Index

25 1-Way Communication of Index Continued

26 Typical Communication Reduction a 2 {0,1} n Create stream s(a) b 2 {0,1} n Create stream s(b) Lower Bound Technique 1. Run Streaming Alg on s(a), transmit state of Alg(s(a)) to Bob 2. Bob computes Alg(s(a), s(b)) 3. If Bob solves g(a,b), space complexity of Alg at least the 1- way communication complexity of g

27 Example: Distinct Elements Give a 1, …, a m in [n], how many distinct numbers are there? Index problem: Alice has a bit string x in {0, 1} n Bob has an index i in [n] Bob wants to know if x i = 1 Reduction: s(a) = i 1, …, i r, where i j appears if and only if x i j = 1 s(b) = i If Alg(s(a), s(b)) = Alg(s(a))+1 then x i = 0, otherwise x i = 1 Space complexity of Alg at least the 1-way communication complexity of Index

28 Strengthening Index: Augmented Indexing Augmented-Index problem: Alice has x 2 {0, 1} n Bob has i 2 [n], and x 1, …, x i-1 Bob wants to learn x i Similar proof shows  (n) bound I(M ; X) = sum i I(M ; X i | X < i ) = n – sum i H(X i | M, X < i ) By Fano’s inequality, H(X i | M, X 1- δ from M, X < i CC δ (Augmented-Index) > I(M ; X) ¸ n(1-H(δ))

29 Lower Bounds for Counting with Deletions

30 Gap-Hamming Problem x 2 {0,1} n y 2 {0,1} n Promise: Hamming distance satisfies Δ(x,y) > n/2 + εn or Δ(x,y) < n/2 - εn Lower bound of Ω(ε -2 ) for randomized 1-way communication [Indyk, W], [W], [Jayram, Kumar, Sivakumar] Gives Ω(ε -2 ) bit lower bound for approximating number of distinct elements Same for 2-way communication [Chakrabarti, Regev]

31 Gap-Hamming From Index [JKS] E[Δ(a,b)] = t/2 + x i ¢ t 1/2 x 2 {0,1} t i 2 [t] t = ε -2 Public coin = r 1, …, r t, each in {0,1} t a 2 {0,1} t b 2 {0,1} t a k = Majority j such that x j = 1 r k j b k = r k i


Download ppt "Information Theory for Data Streams David P. Woodruff IBM Almaden."

Similar presentations


Ads by Google