Data Streams and Applications in Computer Science David Woodruff IBM Almaden Presburger lecture, ICALP, 2014.

Slides:

Advertisements

Similar presentations

Estimating Distinct Elements, Optimally

Advertisements

Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.

Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.

The Complexity of Linear Dependence Problems in Vector Spaces David Woodruff IBM Almaden Joint work with Arnab Bhattacharyya, Piotr Indyk, and Ning Xie.

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

An Optimal Algorithm for the Distinct Elements Problem

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.

Numerical Linear Algebra in the Streaming Model

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.

On the Power of Adaptivity in Sparse Recovery Piotr Indyk MIT Joint work with Eric Price and David Woodruff, 2011.

Uniform Sampling for Matrix Approximation Michael Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford M.I.T.

Vladimir(Vova) Braverman UCLA Joint work with Rafail Ostrovsky.

ABSTRACT We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result.

Sketching for M-Estimators: A Unified Approach to Robust Regression

Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT)

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006

Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research (Joint work with P. Drineas.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006

Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.

Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden.

Sketching for M-Estimators: A Unified Approach to Robust Regression Kenneth Clarkson David Woodruff IBM Almaden.

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.

Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.

Sketching, Sampling and other Sublinear Algorithms: Streaming Alex Andoni (MSR SVC)

How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.

Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.

Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden.

Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.

Information Theory for Data Streams David P. Woodruff IBM Almaden.

Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)

Embedding and Sketching Sketching for streaming Alexandr Andoni (MSR)

Massive Data Sets and Information Theory Ziv Bar-Yossef Department of Electrical Engineering Technion.

Data Stream Algorithms Lower Bounds Graham Cormode

Calculating frequency moments of Data Stream

Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)

Iterative Row Sampling Richard Peng Joint work with Mu Li (CMU) and Gary Miller (CMU) CMU  MIT.

The Message Passing Communication Model David Woodruff IBM Almaden.

Beating CountSketch for Heavy Hitters in Insertion Streams Vladimir Braverman (JHU) Stephen R. Chestnut (ETH) Nikita Ivkin (JHU) David P. Woodruff (IBM)

A Story of Principal Component Analysis in the Distributed Model David Woodruff IBM Almaden Based on works with Christos Boutsidis, Ken Clarkson, Ravi.

New Algorithms for Heavy Hitters in Data Streams David Woodruff IBM Almaden Joint works with Arnab Bhattacharyya, Vladimir Braverman, Stephen R. Chestnut,

SketchVisor: Robust Network Measurement for Software Packet Processing

An Optimal Algorithm for Finding Heavy Hitters

Stochastic Streams: Sample Complexity vs. Space Complexity

New Characterizations in Turnstile Streams with Applications

Estimating L2 Norm MIT Piotr Indyk.

Streaming & sampling.

Approximate Matchings in Dynamic Graph Streams

Sublinear Algorithmic Tools 2

Sketching and Embedding are Equivalent for Norms

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Linear sketching with parities

Near-Optimal (Euclidean) Metric Compression

Overview Massive data sets Streaming algorithms Regression

The Communication Complexity of Distributed Set-Joins

Linear sketching over

Linear sketching with parities

CSCI B609: “Foundations of Data Science”

Range-Efficient Computation of F0 over Massive Data Streams

Streaming Symmetric Norms via Measure Concentration

Lecture 6: Counting triangles Dynamic graphs & sampling

Lecture 15: Least Square Regression Metric Embeddings

Sublinear Algorihms for Big Data

Learning-Based Low-Rank Approximations

Presentation transcript:

Data Streams and Applications in Computer Science David Woodruff IBM Almaden Presburger lecture, ICALP, 2014

Thanks to my advisors Prof. Ron RivestProf. Piotr IndykProf. Andy Yao Thanks for your mentorship and research advice, and early guidance on a path in theoretical computer science

and my amazing summer interns Arnab Bhattacharyya Jelani Nelson Huy Nguyen Marco Molinaro Yi Li Eric Price Grigory Yaroslavtsev

and my awesome collaborators! in the theory group at IBM and throughout the world…

My current research interests Communication Complexity Data Stream Algorithms and Lower Bounds Graph Algorithms Machine learning Numerical Linear Algebra Sketching Sparse Recovery

Talk Outline Data Stream Model and Sample Results –Distinct Elements –Frequency Moments –Characterization of Algorithms Connections to Other Areas –Compressed Sensing –Linear Algebra –Machine Learning

Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples –Internet search logs –Network Traffic –Financial transactions –Sensor networks –Scientific data streams (astronomical, genomics, physical simulations)…

Streaming Model Stream of elements a 1, …, a m each in {1, …, n} Single or small number of passes over the data Algorithms should work for any ordering of elements Almost all algorithms are randomized and approximate –Usually necessary to achieve efficiency –Randomness is in the algorithm, not the input Goals: minimize space complexity (in bits), processing time …

Vector Interpretation Stream: Vector x: Think of x as an n-dimensional vector –Initially, x = 0 n Insertion of i is interpreted as x i = x i + 1 Output an approximation to f(x) with high probability

(1) Distinct Elements Streaming model originated in work of Flajolet and Martin, ‘85 –Studied the distinct elements question –# of distinct elements, denoted F 0, is |{i | x i > 0}| –Output a number Z with F 0 · Z · (1+ε) F 0 with 99% probability –Can we do better than just storing all the coordinates of x? –Yes, and tight bounds are known [Indyk,W],[W],[Kane,Nelson,W]: £ (1/ ε 2 + log n) bits of space, O(1) processing time –Connections: to prove the tight lower bound, the gap-hamming communication problem was introduced

Gap-Hamming Problem x 2 {0,1} n y 2 {0,1} n Promise: Hamming distance satisfies Δ(x,y) > n/2 + εn or Δ(x,y) < n/2 - εn Lower bound of Ω(ε -2 ) for randomized 1-way communication [Indyk, W], [W], [Jayram, Kumar, Sivakumar] Same for 2-way communication [Chakrabarti, Regev] Applications: in information complexity, functional monitoring, embeddings, linear algebra, differential privacy, sparsifiers, … (Andoni, Brody, Clarkson, de Wolf, Jayram, Krauthgamer, McGregor, Mironov, Pitassi, Reingold, Sherstov, Talwar, Vadhan, Vidick, W, Zhang…)

(2) Frequency Moments Streaming model revived in work of Alon, Matias, and Szegedy, ’96 [AMS] Consider more general turnstile streaming model [coined by Muthukrishnan] –positive and negative updates, so x i = x i + 1 or x i = x i – 1 –summarize statistics of difference x-y of two streams of insertions

Frequency Moments [AMS] study frequency moments F p = sum i=1 n |x i | p, or equivalently l p -norms –Summarize skewness of an empirical distribution –F 2 used in computing self-join sizes, geometry and linear algebra –F 1 used for measuring distance between distributions, and in “robust” algorithms (regression, subspace approximation) FlatSkewed

Frequency Moments Output a number Z with F p · Z · (1+ε) F p with 99% probability Near-tight bounds known (Andoni, Bar-Yossef, Braverman, Chakrabarti, Coppersmith, Cormode, Ganguly, Gronemeier, Indyk, Jayram, Kane, Krauthgamer, Kumar, Li, Nelson, Porat, Sivakumar, Sun, W, …) Any guesses on how the space bounds depend on p?

Frequency Moments F 2 is the “breaking point” –F p for p · 2 doable in O~(1) bits of space –F p for p > 2 requires £ ~(n 1-2/p ) bits of space Algorithms achieve O~(1) processing times Connections: “sub-sampling + heavy hitters” technique for the upper bound –Used in many data stream, embedding, and linear algebra problems: earthmover distance, mixed norms, sampling in the turnstile model, compressed sensing, graph sparsifiers, regression –Optimally solves sum i=1 n G(x i ) problems [Braverman, Ostrovsky]

Subsampling + Heavy Hitters CountSketch [Charikar,Chen,Farach-Colton]: –Give each coordinate i a random ¾ (i) 2 {-1,1} –Randomly partition coordinates into B buckets, maintain c j = Σ i s.t. h(i) = j ¾ (i) ¢ x i in j-th bucket.Σ i s.t. h(i) = 2 ¾ (i) ¢ x i.. –Estimate x i as c h(i) x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8 x9x9 x 10 –Estimation error ¼ |x| 2 /B –Can be used to find the “heavy hitters” –It is a linear map x -> S ¢ x Easy to maintain under updates –Estimation error ¼ |x| 2 /B –Can be used to find the “heavy hitters” –It is a linear map x -> S ¢ x Easy to maintain under updates

Subsampling + Heavy Hitters Subsampling [Indyk,W]: –Create nested sequence of subsets of [n] –[n] = L log n ¶ L log n - 1 ¶ … ¶ L 0 –L i contains about 2 i random coordinates –Run CountSketch to find heavy hitters of each x L i –Estimate number of coordinates “at every scale” –Obtain a rough approximation x’ to x n 1/4 n 1/3 1 1 n 1/2  (n) Value Number of coordinates x 2 R n :

(3) Characterization of Turnstile Algorithms All known algorithms in the turnstile model have the form: 1. Choose a random matrix A independent of x 2. Maintain the “linear sketch” Ax in the stream 3. Output a function of Ax Question (?!): does the optimal algorithm for any function in the turnstile model have this form? [Li, Nguyen, W] Yes, up to a factor of log n in the space –Some caveats, e.g., can’t necessarily store A in low space –For lower bounds doesn’t matter, gives simpler proof strategy since just need to rule out linear sketches = A x Ax

Talk Outline Data Stream Model and Sample Results –Distinct Elements –Frequency Moments –Characterization of Algorithms Connections to Other Areas –Compressed Sensing –Linear Algebra –Machine Learning

Compressed Sensing Compute a sketch A ¢ x with a small number of rows (a.k.a. measurements) Output x’ which approximates x in the sense that |x’-x| p · (1+ε) |x-x k | q where x k is the best k-sparse approximation to x Similar to heavy hitters problem solved by CountSketch Variations of CountSketch + subsampling: Can design algorithms with near-optimal number of measurements as a function of various ε, k, p, q [Price, W] For p = q = 2, can reduce number of measurements by adaptively invoking CountSketch [Indyk, Price, W] x x2x2

Linear Algebra Least squares regression –Fitting points to a line, or more generally a subspace –min x |Ax-b| 2 for n x d matrix A, n x 1 vector b –Typically n >> d, i.e., the problem is over-constrained

Linear Algebra If S is a random projection matrix: –compute S*A and S*b, –solve min x |SAx-Sb| 2 –Intuition: randomly rotate the column span of [A ° b], then drop all but first O(d) coordinates (0, 0, 0, …, 0, 1) 2 R n After rotation approximately: (± 1/n 1/2, …, ± 1/n 1/2 ) Drop all but first d coordinates and rescale by (n/d) 1/2 (± 1/d 1/2, …, ± 1/d 1/2 ) 2 R d

Linear Algebra 1+ε approximation in O(nd log n) + poly(d/ε) time using Fast Johnson Lindenstrauss Transforms (restricted family of projections) If replace S with CountSketch, this still works! [Clarkson, W] –Leads to running time O(nnz(A)) + poly(d/ε) time, where nnz(A) is the number of non-zero entries of A Low Rank Approximation –Using CountSkech instead of Fast Johnson Lindenstrauss improves running time from O(nd log n) to O(nnz(A)) [Clarkson, W] Beautiful followup works by Li, Mahoney, Meng, Miller, Nelson, Nguyen, Peng

Machine Learning CountSketch can be used to estimate inner products –Estimate as –E[ ] = –Var[ ] · |x| 2 |y| 2 /B Replace expensive inner product computations in classification algorithms with approximations via CountSketch –perceptron and minimum enclosing ball [Clarkson, Hazan, W] Often interested in non-linear kernel transformations of input points: x 1, …, x n -> f(x 1 ), …, f(x n ) –“Tensor product” CountSketch of Pagh gives subspace embeddings of the polynomial kernel [Avron, Nguyen, W]

Conclusions Many data stream and sketching techniques give efficient ways of “compressing” big data – a broadly applicable goal in computer science –Compressed sensing, graph algorithms, linear algebra, machine learning… –Recently been looking at shape-fitting and clustering problems, etc. –Also useful for proving lower bounds in other areas, e.g., number of measurements in sparse recovery [DoBa,Indyk,Price,W] –I’m sure there are many other unexplored areas Thank you!