Sketching via Hashing: from Heavy Hitters to Compressive Sensing to Sparse Fourier Transform Piotr Indyk MIT.

Slides:

Advertisements

Similar presentations

Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Advertisements

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.

Fast Johnson-Lindenstrauss Transform(s) Nir Ailon Edo Liberty, Bernard Chazelle Bertinoro Workshop on Sublinear Algorithms May 2011.

Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic.

On the Power of Adaptivity in Sparse Recovery Piotr Indyk MIT Joint work with Eric Price and David Woodruff, 2011.

Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic.

Image acquisition using sparse (pseudo)-random matrices Piotr Indyk MIT.

Compressive Sensing IT530, Lecture Notes.

Sparse Recovery (Using Sparse Matrices)

Lecture 15 Orthogonal Functions Fourier Series. LGA mean daily temperature time series is there a global warming signal?

Fast Algorithms For Hierarchical Range Histogram Constructions

Online Performance Guarantees for Sparse Recovery Raja Giryes ICASSP 2011 Volkan Cevher.

Heavy Hitters Piotr Indyk MIT. Last Few Lectures Recap (last few lectures) –Update a vector x –Maintain a linear sketch –Can compute L p norm of x (in.

Sketching for M-Estimators: A Unified Approach to Robust Regression

Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.

Compressed sensing Carlos Becker, Guillaume Lemaître & Peter Rennert

ECE Department Rice University dsp.rice.edu/cs Measurements and Bits: Compressed Sensing meets Information Theory Shriram Sarvotham Dror Baron Richard.

“Random Projections on Smooth Manifolds” -A short summary

CSCI 317 Mike Heroux1 Sparse Matrix Computations CSCI 317 Mike Heroux.

Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.

Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research (Joint work with P. Drineas.

Richard Baraniuk Rice University dsp.rice.edu/cs Compressive Signal Processing.

Compressive Signal Processing

Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)

Introduction to Compressive Sensing

Rice University dsp.rice.edu/cs Distributed Compressive Sensing A Framework for Integrated Sensing and Processing for Signal Ensembles Marco Duarte Shriram.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006

Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden.

Sketching for M-Estimators: A Unified Approach to Robust Regression Kenneth Clarkson David Woodruff IBM Almaden.

Transforms: Basis to Basis Normal Basis Hadamard Basis Basis functions Method to find coefficients (“Transform”) Inverse Transform.

Hybrid Dense/Sparse Matrices in Compressed Sensing Reconstruction

How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.

Compressive Sampling: A Brief Overview

Sparse Fourier Transform

AMSC 6631 Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images: Midyear Report Alfredo Nava-Tudela John J. Benedetto,

Recovery of Clustered Sparse Signals from Compressive Measurements

Cs: compressed sensing

Transforms. 5*sin (2  4t) Amplitude = 5 Frequency = 4 Hz seconds A sine wave.

Lecture 12: Introduction to Discrete Fourier Transform Sections 2.2.3, 2.3.

Learning With Structured Sparsity

Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.

Compressive Sensing for Multimedia Communications in Wireless Sensor Networks By: Wael BarakatRabih Saliba EE381K-14 MDDSP Literary Survey Presentation.

Shriram Sarvotham Dror Baron Richard Baraniuk ECE Department Rice University dsp.rice.edu/cs Sudocodes Fast measurement and reconstruction of sparse signals.

1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.

Robust Principal Components Analysis IT530 Lecture Notes.

Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)

Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.

n-pixel Hubble image (cropped)

Sparse RecoveryAlgorithmResults  Original signal x = x k + u, where x k has k large coefficients and u is noise.  Acquire measurements Ax = y. If |x|=n,

A Story of Principal Component Analysis in the Distributed Model David Woodruff IBM Almaden Based on works with Christos Boutsidis, Ken Clarkson, Ravi.

Recent Developments in the Sparse Fourier Transform

Lecture 13 Compressive sensing

Streaming & sampling.

Lecture 15 Sparse Recovery Using Sparse Matrices

Sublinear Algorithmic Tools 3

Machine Learning Feature Creation and Selection

COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.

Lecture 4: CountSketch High Frequencies

Lecture 7: Dynamic sampling Dimension Reduction

Y. Kotidis, S. Muthukrishnan,

Linear sketching over

Sudocodes Fast measurement and reconstruction of sparse signals

Linear sketching with parities

CSCI B609: “Foundations of Data Science”

Generally Discriminant Analysis

Lecture 15: Least Square Regression Metric Embeddings

Sudocodes Fast measurement and reconstruction of sparse signals

Learning-Based Low-Rank Approximations

Presentation transcript:

Sketching via Hashing: from Heavy Hitters to Compressive Sensing to Sparse Fourier Transform Piotr Indyk MIT

Outline Sketching via hashing Compressive sensing Numerical linear algebra (regression, low rank approximation) Sparse Fourier Transform c 1 … c m b A

“Sketching via hashing”: a technique Suppose that we have a sequence S of elements a 1..a s from range {1…n} Want to approximately count the elements using small space –For each element a, get an approximation of the count x a of a in S Method: –Initialize an array c=[c 1,…c m ] –Prepare a random hash function h: {1..n}→{1..m} –For each element a perform c h(a) =c h(a) +inc(a) Result: c j =∑ a: h(a)=j x a * inc(a) To estimate x a return x* a = c h(a) /inc(a) a 1, a 2, a 3, a 4, …………a s c 1 … c h(a) … c m a

Why would this work? We have c h(a) =x a *inc(a)+noise Therefore x* a = c h(a) /inc(a) = x a +noise/inc(a) Incrementing options: –inc(a)=1 [FCAB98,EV’02,CM’03,CM’04] Simply counts the total, no under- estimation –inc(a)=±1 [CFC’02] Noise can cancel, unbiased estimator, can under-estimate –inc(a)=Gaussian [GGIKMS’02] Noise has a nice distribution c j =∑ a: h(a)=j x a * inc(a) x* a = c h(a) /inc(a) c 1 … c h(a) … c m xaxa

What are the guarantees? For simplicity, consider inc(a)=1 Tradeoff between accuracy and space m Definitions: –Let k=m/C –Let H be the set of k heaviest coefficients of x, i.e., the “head” –Let Tail 1 k =||x -H || 1 (i.e., the sum of coeffs not in the head) Will show that, with constant probability x a ≤ x* a ≤x a + Tail 1 k /k Meaning: –For a stream with s elements, the error is always at most 1/k * s –Even better if head really heavy c j =∑ a: h(a)=j x a x* a = c h(a) c 1 … c m xaxa xa*xa* Head

Analysis We show how to get an estimate x* a ≤x a + Tail / k Pr[ |x* a -x a | > Tail/k] ≤ P 1 +P 2, where P 1 = Pr[ a collides with (another) head element ] P 2 = Pr[ sum of tail elems colliding with a is > Tail/k ] We have P 1 ≤ k/m =1/C P 2 ≤ (Tail/m)/(Tail/k) = k/m = 1/C Total probability of failure ≤ 2/C Can reduce the probability to 1/poly(n) by log n repetitions  space O(k log n) c j =∑ a: h(a)=j x a x* a = c h(a) c 1 … c m xaxa xa*xa* Head

Compressive sensing

Compressive sensing [Candes-Romberg-Tao, Donoho] (also: approximation theory, learning Fourier coeffs, finite rate of innovation, …) Setup: – Data/signal in n-dimensional space : x E.g., x is an 256x256 image  n=65536 – Goal: compress x into a “measurement” Ax, where A is a m x n “measurement matrix”, m << n Requirements: – Plan A: want to recover x from Ax Impossible: underdetermined system of equations – Plan B: want to recover an “approximation” x* of x Sparsity parameter k Informally: want to recover largest k coordinates of x Formally: want x* such that e.g. (L1/L1) ||x*-x|| 1  C min x’ ||x’-x|| 1 =C Tail 1 k over all x’ that are k-sparse (at most k non-zero entries) Want: – Good compression (small m=m(k,n)) – Efficient algorithms for encoding and recovery Why linear compression ? = A x Ax

Applications Single pixel camera [Wakin, Laska, Duarte, Baron, Sarvotham, Takhar, Kelly, Baraniuk’06] Pooling Experiments [Kainkaryam, Woolf’08], [Hassibi et al’07], [Dai-Sheikh, Milenkovic, Baraniuk], [Shental-Amir-Zuk’09],[Erlich-Shental-Amir- Zuk’09], [Kainkaryam, Bruex, Gilbert, Woolf’10]…

Results Excellent Scale: Very GoodGoodFair [Candes-Romberg- Tao’04] Dk log(n/k)ncnc l2 / l1 PaperR/ D Sketch lengthRecovery timeApprox

Sketching as compressive sensing Hashing view: – h hashes coordinates into “buckets” c 1 …c m – Each bucket sums up its coordinates Matrix view: – A 0-1 mxn matrix A, with one 1 per column – The a-th column has 1 at position h(a), where h(a) be chosen uniformly at random from {1…m} Sketch is equal to c=Ax Guarantee: if we repeat hashing log n times then with high probability ||x*-x|| ∞  Tail 1 k /k L ∞ /L1 guarantee, implies L1/L1 c 1 … c m xaxa xa*xa* n m

Excellent Scale: Very GoodGoodFair PaperR/ D Sketch lengthRecovery timeApprox [CCF’02], [CM’06] Rk log nn log nl2 / l2 [CM’04]Rk log nn log nl1 / l1 [NV’07], [DM’08], [NT’08], [BD’08], [GK’09], … Dk log(n/k)nk log(n/k) * logl2 / l1 Dk log c nn log n * logl2 / l1 [IR’08], [BIR’08],[BI’09]Dk log(n/k)n log(n/k)* logl1 / l1 [Candes-Romberg- Tao’04] Dk log(n/k)ncnc l2 / l1 …….. c 1 … c m xixi xi*xi* Insight: Several random hash functions form an expander graph Results

Regression

Least-Squares Regression A is an n x d matrix, b an n x 1 column vector – Consider over-constrained case, n >> d Find d-dimensional x so that ||Ax-b|| 2 ≤ (1+ε) min y ||Ay-b|| 2 Want to find the (approx) closest point in the column space of A to the vector b b A

Approximation Computing the solution exactly takes O(nd 2 ) time Too slow, so ε > 0 and a tiny probability of failure OK Approach: sub-space embedding [Sarlos’06] – Consider subspace L spanned by columns of A together with b – Want: mxn matrix S, m “small”, such that for all y in L ||Sy|| 2 = (1± ε) ||y|| 2 – Then ||S(Ax-b)|| 2 = (1± ε) ||Ax-b|| 2 for all x – Solve argmin y ||(SA)y – (Sb)|| 2 – Given SA, Sb, can solve in poly(m) time b A Sb SA

Fast Dimensionality Reduction Need mxn dimensionality reduction matrix S such that: – m is “close to” d, and – matrix-vector product Sz can be computed quickly Johnson-Lindenstrauss: O(nm) time Fast Johnson-Lindenstrauss: O(n log n) (randomized Hadamard transform) [AC’06, AL’11, KW,11,NPW’12] Sparse Johnson-Lindenstrass: O(nnz(z)*εm) [SPD+09, WDL+09, DKS10, BOR10, KN12] Surprise! For subspace embedding O~(nnz(z)) time and m=poly(d) suffices [CW13, NN13,MM13] Leads to regression and low-rank approx algorithms with O~(nnz(A)+poly(d)) running time Count-sketch-like matrix S: [ [

Heavy hitters a la regression Can assume columns of A are orthonormal – ||A|| F 2 = d Let T be any set of size O(d 2 ) containing all rows indexes i in [n] for which the row A i has squared norm Ω(1/d) – “Heavy hitter rows” Suffices to ensure: – Heavy hitters do not collide – perfect hashing – Smaller elements concentrate This gives sparse dimensionality reduction matrix with m=poly(d) rows – Clarkson-Woodruff: m=O~(d 2 ) – Nelson-Nguyen: m=O~(d) T A

Sparse Fourier Transform

Discrete Fourier Transform: – Given: a signal x[1…n] – Goal: compute the frequency vector x’ where x’ f = Σ t x t e -2πi tf/n Very useful tool: – Compression (audio, image, video) – Data analysis – Feature extraction –…–… See SIGMOD’04 tutorial “Indexing and Mining Streams” by C. Faloutsos Fourier Transform Sampled Audio Data (Time) DFT of Audio Samples (Frequency)

Computing DFT Fast Fourier Transform (FFT) computes the frequencies in time O(n log n) But, we can do (much) better if we only care about small number k of “dominant frequencies” – E.g., recover assume it is k-sparse (only k non- zero entries) Algorithms: – Boolean cube (Hadamard Transform): [GL’89], [KM’93], [L’93] – Complex FT: [Mansour’92, GGIMS’02, AGS’03, GMS’05, Iwen’10, Akavia’10, HIKP’12,HIKP’12b, BCGLS’12, LWC’12, GHIKPL’13,…] Best running times [Hassanieh-Indyk-Katabi- Price’12] – Exactly k-sparse signals: O(k log n) – Approx. k-sparse signals* : O(k log n * log(n/k)) *L2/L2 guarantee

Intuition: Fourier Frequency Domain Time Domain Signal Cut off Time signal Frequency Domain First B samples Frequency Domain ‘ ‘ ‘

Main task We we would like this … to act like this:

Issues “Leaky” buckets “Hashing”: needs a random hashing of the spectrum …

Filters: boxcar filter (used in[GGIMS02,GMS05]) Boxcar -> Sinc – Polynomial decay – Leaking many buckets

Filters: Gaussian Gaussian -> Gaussian – Exponential decay – Leaking to (log n) 1/2 buckets

Conclusions Sketching via hashing – Simple technique – Powerful implications Questions: – What is next ? c 1 … c m b A