Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden.

Slides:

Advertisements

Similar presentations

Estimating Distinct Elements, Optimally

Advertisements

Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.

Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT

Quantum t-designs: t-wise independence in the quantum world Andris Ambainis, Joseph Emerson IQC, University of Waterloo.

Limitations of Quantum Advice and One-Way Communication Scott Aaronson UC Berkeley IAS Useful?

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

Xiaoming Sun Tsinghua University David Woodruff MIT

Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.

Quantum One-Way Communication is Exponentially Stronger than Classical Communication TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Gillat Kol (IAS) joint work with Ran Raz (Weizmann + IAS) Interactive Channel Capacity.

Foundations of Cryptography Lecture 4 Lecturer: Moni Naor.

Metric Embeddings As Computational Primitives Robert Krauthgamer Weizmann Institute of Science [Based on joint work with Alex Andoni]

Chain Rules for Entropy

Sketching for M-Estimators: A Unified Approach to Robust Regression

Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006

On the tightness of Buhrman- Cleve-Wigderson simulation Shengyu Zhang The Chinese University of Hong Kong On the relation between decision tree complexity.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006

Class notes for ISE 201 San Jose State University

Avraham Ben-Aroya (Tel Aviv University) Oded Regev (Tel Aviv University) Ronald de Wolf (CWI, Amsterdam) A Hypercontractive Inequality for Matrix-Valued.

1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden.

1 The Complexity of Massive Data Set Computations Ziv Bar-Yossef Computer Science Division U.C. Berkeley Ph.D. Dissertation Talk May 6, 2002.

Continuous Random Variables and Probability Distributions

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005

Lower Bounds for Massive Dataset Algorithms T.S. Jayram (IBM Almaden) IIT Kanpur Workshop on Data Streams.

Foundations of Cryptography Lecture 9 Lecturer: Moni Naor.

On Embedding Edit Distance into L_11 On Embedding Edit Distance into L 1 Robert Krauthgamer (Weizmann Institute and IBM Almaden)‏ Based on joint work (i)

Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.

1 Information and interactive computation January 16, 2012 Mark Braverman Computer Science, Princeton University.

Information Theory for Data Streams David P. Woodruff IBM Almaden.

Umans Complexity Theory Lectures Lecture 7b: Randomization in Communication Complexity.

Embedding and Sketching Sketching for streaming Alexandr Andoni (MSR)

Massive Data Sets and Information Theory Ziv Bar-Yossef Department of Electrical Engineering Technion.

Data Stream Algorithms Lower Bounds Graham Cormode

Lecture 5 Today, how to solve recurrences We learned “guess and proved by induction” We also learned “substitution” method Today, we learn the “master.

Continuous Random Variables and Probability Distributions

Communication Complexity Guy Feigenblat Based on lecture by Dr. Ely Porat Some slides where adapted from various sources Complexity course Computer science.

1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 467 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 3524 Course.

Tight Bound for the Gap Hamming Distance Problem Oded Regev Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)

Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.

The Message Passing Communication Model David Woodruff IBM Almaden.

Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.

Beating the Direct Sum Theorem in Communication Complexity with Implications for Sketching Marco Molinaro David Woodruff Grigory Yaroslavtsev Georgia Tech.

Imperfectly Shared Randomness

Random Access Codes and a Hypercontractive Inequality for

Information Complexity Lower Bounds

Stochastic Streams: Sample Complexity vs. Space Complexity

New Characterizations in Turnstile Streams with Applications

Streaming & sampling.

Applications of Information Complexity II

CS 154, Lecture 6: Communication Complexity

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Linear sketching with parities

The Communication Complexity of Distributed Set-Joins

Linear sketching over

Linear sketching with parities

Imperfectly Shared Randomness

Streaming Symmetric Norms via Measure Concentration

Lecture 6: Counting triangles Dynamic graphs & sampling

Presentation transcript:

Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden

Last time: 1-Way Communication of Index Alice has x 2 {0,1} n Bob has i 2 [n] Alice sends a (randomized) message M to Bob I(M ; X) = sum i I(M ; X i | X < i ) ¸ sum i I(M; X i ) = n – sum i H(X i | M) Fano: H(X i | M) 1- δ CC δ (Index) ¸ I(M ; X) ¸ n(1-H(δ)) The same lower bound applies if the protocol is only correct on average over x and i drawn independently from a uniform distribution

Distributional Communication Complexity X Y f(X,Y)?

Indexing is Universal for Product Distributions [Kremer, Nisan, Ron] [ [

Indexing with Low Error Index Problem with 1/3 error probability and 0 error probability both have £ (n) communication In some applications want lower bounds in terms of error probability Indexing on Large Alphabets: –Alice has x 2 {0,1} n/δ with wt(x) = n, Bob has i 2 [n/δ] –Bob wants to decide if x i = 1 with error probability δ –[Jayram, W] 1-way communication is  (n log(1/δ))

Beyond Product Distributions

Non-Product Distributions Needed for stronger lower bounds Example: approximate |x| 1 up to a multiplicative factor of B in a stream –Lower bounds for heavy hitters, p-norms, etc. Promise: |x-y| 1 · 1 or |x-y| 1 ¸ B Hard distribution non-product  (n/B 2 ) lower bound [Saks, Sun] [Bar-Yossef, Jayram, Kumar, Sivakumar] x 2 {0, …, B} n y 2 {0, …, B} n Gap 1 (x,y) Problem

Direct Sums Gap 1 (x,y) doesn’t have a hard product distribution, but has a hard distribution μ = λ n in which the coordinate pairs (x 1, y 1 ), …, (x n, y n ) are independent –w.pr. 1-1/n, (x i, y i ) random subject to |x i – y i | · 1 –w.pr. 1/n, (x i, y i ) random subject to |x i – y i | ¸ B Direct Sum: solving Gap 1 (x,y) requires solving n single- coordinate sub-problems g In g, Alice and Bob have J,K 2 {0, …, B}, and want to decide if |J-K| · 1 or |J-K| ¸ B

Direct Sum Theorem Let M be the message from Alice to Bob For X, Y » μ, I(M ; X, Y) = H(X,Y) – H(X,Y | M ) is the information cost of the protocol [BJKS]: ?!?!?!?! why not measure I(M ; X, Y) when (X,Y) satisfy |X-Y| 1 · 1? –Is I(M ; X, Y) large? –Let us go back to protocols correct on each X, Y w.h.p. Redefine μ = λ n, where (X i, Y i ) » λ is random subject to |X i -Y i | · 1 IC(g) = inf ψ I(ψ ; J, K), where ψ ranges over all 2/3-correct 1-way protocols for g, and J,K » ¸ Is I(M ; X, Y) =  (n) ¢ IC(g)?

The Embedding Step I(M ; X, Y) ¸  i I(M ; X i, Y i ) We need to show I(M ; X i, Y i ) ¸ IC(g) for each i J K Alice i-th coordinate Bob X Y J K Suppose Alice and Bob could fill in the remaining coordinates j of X, Y so that (X j, Y j ) » λ Then we get a correct protocol for g!

Conditional Information Cost (X j, Y j ) » λ is not a product distribution [BJKS] Define D = ((P 1, V 1 )…, (P n, V n )): –P j uniform on {Alice, Bob} –V j uniform on {1, …, B} if P j = Alice –V j uniform on {0, …, B-1} if P j = Bob –If P j = Alice, then Y j = V j and X j is uniform on {V j -1, V j } –If P j = Bob, then X j = V j and Y j is uniform on {V j, V j +1} X and Y are independent conditioned on D! I(M ; X, Y | D) =  (n) ¢ IC(g | (P,V)) holds now! IC(g | (P,V)) = inf ψ I(ψ ; J, K | (P,V)), where ψ ranges over all 2/3-correct protocols for g, and J,K » ¸

Primitive Problem Need to lower bound IC(g | (P,V)) For fixed P = Alice and V = v, this is I(ψ ; K) where K is uniform over v-1, v From previous lectures: I(ψ ; K) ¸ D JS (ψ v-1,v, ψ v, v ) IC(g | (P,V)) ¸ E v [D JS (ψ v-1,v, ψ v, v ) + D JS (ψ v,v, ψ v,v+1 )]/2 Forget about distributions, let’s move to unit vectors!

Hellinger Distance

Pythagorean Property

Lower Bounding the Primitive Problem

Direct Sum Wrapup  (n/B 2 ) bound for Gap 1 (x,y) Similar argument gives  (n) bound for disjointness [BJKS] [MYW] Sometimes can “beat” a direct sum: solving all n copies simultaneously with constant probability as hard as solving each copy with probability 1-1/n –E.g., 1-way communication complexity of Equality Direct sums are nice, but often a problem can’t be split into simpler smaller problems, e.g., no known embedding step in gap-Hamming