Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.

Slides:



Advertisements
Similar presentations
Polylogarithmic Private Approximations and Efficient Matching
Advertisements

Estimating Distinct Elements, Optimally
Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.
Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.
Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff.
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.
The Complexity of Linear Dependence Problems in Vector Spaces David Woodruff IBM Almaden Joint work with Arnab Bhattacharyya, Piotr Indyk, and Ning Xie.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
An Optimal Algorithm for the Distinct Elements Problem
Numerical Linear Algebra in the Streaming Model
Efficient Private Approximation Protocols Piotr Indyk David Woodruff Work in progress.
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Xiaoming Sun Tsinghua University David Woodruff MIT
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
An Introduction to Randomness Extractors Ronen Shaltiel University of Haifa Daddy, how do computers get random bits?
Fast Johnson-Lindenstrauss Transform(s) Nir Ailon Edo Liberty, Bernard Chazelle Bertinoro Workshop on Sublinear Algorithms May 2011.
Sublinear Algorithms … Lecture 23: April 20.
Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak.
On the Amortized Complexity of Zero-Knowledge Proofs Ronald Cramer, CWI Ivan Damgård, Århus University.
Shortest Vector In A Lattice is NP-Hard to approximate
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
An Improved Data Stream Summary: The Count-Min Sketch and its Applications Graham Cormode, S. Muthukrishnan 2003.
Sketching for M-Estimators: A Unified Approach to Robust Regression
Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.
By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.
1 Fingerprinting techniques. 2 Is X equal to Y? = ? = ?
Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Information Theory for Data Streams David P. Woodruff IBM Almaden.
Data Stream Algorithms Lower Bounds Graham Cormode
Tight Bound for the Gap Hamming Distance Problem Oded Regev Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)
The Message Passing Communication Model David Woodruff IBM Almaden.
Beating CountSketch for Heavy Hitters in Insertion Streams Vladimir Braverman (JHU) Stephen R. Chestnut (ETH) Nikita Ivkin (JHU) David P. Woodruff (IBM)
A Story of Principal Component Analysis in the Distributed Model David Woodruff IBM Almaden Based on works with Christos Boutsidis, Ken Clarkson, Ravi.
New Algorithms for Heavy Hitters in Data Streams David Woodruff IBM Almaden Joint works with Arnab Bhattacharyya, Vladimir Braverman, Stephen R. Chestnut,
An Optimal Algorithm for Finding Heavy Hitters
Support Vector Machines (SVM)
Information Complexity Lower Bounds
Stochastic Streams: Sample Complexity vs. Space Complexity
New Characterizations in Turnstile Streams with Applications
Streaming & sampling.
Branching Programs Part 3
Background: Lattices and the Learning-with-Errors problem
Sublinear Algorithmic Tools 2
Applications of Information Complexity II
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Linear sketching with parities
Y. Kotidis, S. Muthukrishnan,
The Communication Complexity of Distributed Set-Joins
Linear sketching over
Linear sketching with parities
CSCI B609: “Foundations of Data Science”
Streaming Symmetric Norms via Measure Concentration
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Lecture 15: Least Square Regression Metric Embeddings
Sublinear Algorihms for Big Data
Presentation transcript:

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden

Data Stream Model Have a stream of m updates to an n-dimensional vector v –add x to coordinate i –Insertion model -> all updates x are positive –Turnstile model -> x can be positive or negative –stream length and updates < poly(n) Estimate statistics of v –# of distinct elements F 0 –L p -norm |v| p = (Σ i |v i | p ) 1/p –entropy –and so on Goal: output a (1+ ε)-approximation with limited memory

Lots of Optimal Papers Lots of optimal results –An optimal algorithm for the distinct elements problem [KNW] –Fast moment estimation in optimal space [KNPW] –A near-optimal algorithm for estimating entropy of a stream [CCM] –Optimal approximations of the frequency moments of data streams [IW] –A near-optimal algorithm for L1-difference [NW] –Optimal space lower bounds for all frequency moments [W] This paper –Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error

What Is Optimal? F 0 = # of non-zero entries in v For a stream of indices in {1, …, n}, our algorithm computes a (1+ε)-approximation using an optimal O(ε -2 + log n) bits of space with 2/3 success probability… This probability can be amplified by independent repetition. If we want high probability, say, 1-1/n, this increases the space by a multiplicative log n So optimal algorithms are only optimal for algorithms with constant success probability

Can We Improve the Lower Bounds? x 2 {0,1} ε -2 y 2 {0,1} ε -2 Gap-Hamming: either Δ(x,y) > ½ + ε or Δ(x,y) < ½-ε Lower bound of Ω(ε -2 ) with 1/3 error probability But upper bound of ε -2 with 0 error probability

Our Results

Streaming Results Independent repetition is optimal! Estimating L p -norm in turnstile model up to 1+ε w.p. 1-δ –Ω(ε -2 log n log 1/δ) bits for any p –[KNW] get O(ε -2 log n log 1/δ) for 0 · p · 2 Estimating F 0 in insertion model up to 1+ε w.p. 1-δ –Ω(ε -2 log 1/δ + log n) bits –[KNW] get O(ε -2 log 1/δ) for ε -2 > log n Estimating entropy in turnstile model up to 1+ε w.p. 1-δ –Ω(ε -2 log n log 1/δ) bits –Improves Ω(ε -2 log n) bound [KNW]

Johnson-Lindenstrauss Transforms Let A be a random matrix so that with probability 1- δ, for any fixed q 2 R d |Aq| 2 = (1 ± ε) |q| 2 [JL] A can be a 1/ε 2 log 1/δ x d matrix –Gaussians or sign variables work [Alon] A needs to have (1/ε 2 log 1/δ) / log 1/ε rows Our result: A needs to have 1/ε 2 log 1/δ rows

Communication Complexity Separation x y f(x,y) 2 {0,1} D 1/3, ρ (f) = communication of best 1-way deterministic protocol that errs w.p. 1/3 on distribution ρ [KNR]: R || 1/3 (f) = max product distributions ¹ £ λ D ¹ £ λ,1/3 (f) 0 1

Communication Complexity Separation f(x,y) 2 {0,1} VC-dimension: maximum number r of columns for which all 2 r rows occur in communication matrix on these columns [KNR]: R || 1/3 (f) = Θ(VC-dimension(f)) Our result: there exist f and g with VC-dimension k, but: R || δ (f) = Θ(k log 1/δ) while R || δ (g) = Θ(k)

Our Techniques

Lopsided Set Intersection (LSI) S ½ {1, 2, …, U} |S| = 1/ ε 2 U = 1/ ε 2 ¢ 1/δ Is S Å T = ; ? - Alice cannot describe S with o( ε -2 log U) bits - If x, y are uniform then with constant probability, S Å T = ; - R || 1/3 (LSI) > D uniform, 1/3 (LSI) = Ω(ε -2 log 1/δ) T ½ {1, 2, …, U} |T| = 1/δ

Lopsided Set Intersection (LSI2) S ½ {1, 2, …, U} |S| = 1/ ε 2 U = 1/ ε 2 ¢ 1/δ Is S Å T = ; ? T ½ {1, 2, …, U} |T| = 1 -R || δ/3 (LSI2) ¸ R | | 1/3 (LSI) = Ω(ε -2 log 1/δ) - Union bound over set elements in LSI instance

Low Error Inner Product x 2 {0, ε} U |x| 2 = 1 y 2 {0, 1} U |y| 2 = 1 Does = 0? Estimate up to ε w.p. 1-δ -> solve LSI2 w.p. 1-δ R || δ (inner product ε ) = Ω(ε -2 log 1/δ) U = 1/ε 2 ¢ 1/δ

L 2 -estimation ε x 2 {0, ε} U |x| 2 = 1 What is |x-y| 2 ? U = 1/ ε 2 ¢ 1/δ y 2 {0, 1 } U |y| 2 = 1 - |x-y| 2 2 = |x| |y| = 2 – 2 - Estimate |x-y| 2 up to (1+ Θ( ε))-factor solves inner-product ε - So R || δ (L 2 -estimation ε ) = Ω(ε -2 log 1/δ) - log 1/δ factor is new, but want an (ε -2 log n log 1/δ) lower bound - Can use a known trick to get an extra log n factor - log 1/δ factor is new, but want an (ε -2 log n log 1/δ) lower bound - Can use a known trick to get an extra log n factor

Augmented Lopsided Set Intersection (ALSI2) Universe [U] = [1/ ε 2 ¢ 1/δ] S 1, …, S r ½ [U] All i: |S i | = 1/ ε 2 j 2 [U] i * 2 {1, 2, …, r} S i*+1 …, S r Is j 2 S i * ? R || 1/3 (ALSI2) = (r ε -2 log 1/δ)

Reduction of ALSI2 to L 2 -estimation ε S1S2…SrS1S2…Sr x1x2…xrx1x2…xr j S i*+1 … S r y i* x i*+1 … x r } x } y y - x = 10 i* y i* - i i* 10 i ¢ x i |y-x| 2 is dominated by 10 i* |y i* – x i* | 2 - Set r = Θ(log n) - R || δ (L 2 -estmation ε ) = ( ε -2 log n log 1/δ) - Streaming Space > R || δ (L 2 -estimation ε ) - Set r = Θ(log n) - R || δ (L 2 -estmation ε ) = ( ε -2 log n log 1/δ) - Streaming Space > R || δ (L 2 -estimation ε )

Lower Bounds for Johnson- Lindenstrauss Use public randomness to agree on a JL matrix A x 2 {-n O(1), …, n O(1) } t AxAy- |A(x-y)| 2 - Can estimate |x-y| 2 up to 1+ ε w.p. 1-δ - #rows(A) = (r ε -2 log 1/δ / log n) - Set r = Θ(log n) - Can estimate |x-y| 2 up to 1+ ε w.p. 1-δ - #rows(A) = (r ε -2 log 1/δ / log n) - Set r = Θ(log n) y 2 {-n O(1), …, n O(1) } t

Low-Error Hamming Distance Universe = [n ] Δ(x,y) = Hamming Distance between x and y x 2 {0,1} n y 2 {0,1} n R || δ (Δ(x,y) ε ) = (ε -2 log 1/ δ log n) Reduction to ALSI2 Gap-Hamming to LSI2 reductions with Low Error Implies our lower bounds for estimating Any L p -norm Distinct Elements Entropy

Conclusions Prove first streaming space lower bounds that depend on probability of error δ –Optimal for L p -norms, distinct elements –Improves lower bound for entropy –Optimal dimensionality bound for JL transforms Adds several twists to augmented indexing proofs –Augmented indexing with a small set in a large domain –Proof builds upon lopsided set disjointness lower bounds –Uses multiple Gap-Hamming to Indexing reductions that handle low error

ALSI2 to Hamming Distance - Let t = 1/ ε 2 log 1/δ - Use public coin to generate t random strings b 1, …, b t 2 {0,1} t - Alice sets x i = majority k in S i b i, k - Bob sets y i = b i,j S 1, …, S r ½ [1/ ε 2 ¢ 1/δ ] All i: |S i | = 1/ ε 2 j 2 [U] i * 2 {1, 2, …, r} S i*+1 …, S r Embed multiple copies by duplicating coordinates at different scales