1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Slides:

Advertisements

Similar presentations

Polylogarithmic Private Approximations and Efficient Matching

Advertisements

Estimating Distinct Elements, Optimally

Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.

Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff.

The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.

The Complexity of Linear Dependence Problems in Vector Spaces David Woodruff IBM Almaden Joint work with Arnab Bhattacharyya, Piotr Indyk, and Ning Xie.

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.

Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT

Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.

Numerical Linear Algebra in the Streaming Model

Efficient Private Approximation Protocols Piotr Indyk David Woodruff Work in progress.

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

Xiaoming Sun Tsinghua University David Woodruff MIT

Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.

Truthful Mechanisms for Combinatorial Auctions with Subadditive Bidders Speaker: Shahar Dobzinski Based on joint works with Noam Nisan & Michael Schapira.

Sublinear Algorithms … Lecture 23: April 20.

Shortest Vector In A Lattice is NP-Hard to approximate

Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.

The Communication Complexity of Approximate Set Packing and Covering

On the Power of Adaptivity in Sparse Recovery Piotr Indyk MIT Joint work with Eric Price and David Woodruff, 2011.

3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.

Sketching for M-Estimators: A Unified Approach to Robust Regression

Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.

ECE Department Rice University dsp.rice.edu/cs Measurements and Bits: Compressed Sensing meets Information Theory Shriram Sarvotham Dror Baron Richard.

Ensemble Learning: An Introduction

The Goldreich-Levin Theorem: List-decoding the Hadamard code

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

Sketching for M-Estimators: A Unified Approach to Robust Regression Kenneth Clarkson David Woodruff IBM Almaden.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005

How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.

Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)

Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.

Finding Frequent Items in Data Streams [Charikar-Chen-Farach-Colton] Paper report By MH, 2004/12/17.

Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden.

Channel Capacity.

Umans Complexity Theory Lectures Lecture 7b: Randomization in Communication Complexity.

Data Stream Algorithms Lower Bounds Graham Cormode

Communication Complexity Guy Feigenblat Based on lecture by Dr. Ely Porat Some slides where adapted from various sources Complexity course Computer science.

Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)

Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.

The Message Passing Communication Model David Woodruff IBM Almaden.

Beating CountSketch for Heavy Hitters in Insertion Streams Vladimir Braverman (JHU) Stephen R. Chestnut (ETH) Nikita Ivkin (JHU) David P. Woodruff (IBM)

Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.

A Story of Principal Component Analysis in the Distributed Model David Woodruff IBM Almaden Based on works with Christos Boutsidis, Ken Clarkson, Ravi.

New Algorithms for Heavy Hitters in Data Streams David Woodruff IBM Almaden Joint works with Arnab Bhattacharyya, Vladimir Braverman, Stephen R. Chestnut,

An Optimal Algorithm for Finding Heavy Hitters

Information Complexity Lower Bounds

Stochastic Streams: Sample Complexity vs. Space Complexity

New Characterizations in Turnstile Streams with Applications

Lecture 22: Linearity Testing Sparse Fourier Transform

Streaming & sampling.

Approximate Matchings in Dynamic Graph Streams

Lecture 18: Uniformity Testing Monotonicity Testing

Background: Lattices and the Learning-with-Errors problem

Lecture 4: CountSketch High Frequencies

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Linear sketching with parities

The Communication Complexity of Distributed Set-Joins

Linear sketching over

Linear sketching with parities

CSCI B609: “Foundations of Data Science”

Calibration and homographies

Presentation transcript:

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden

Compressed Sensing Choose an r x n matrix A Given x 2 R n Compute Ax Output a vector y so that |x-y| p · (1+ε) |x-x top k | p x top k is the k-sparse vector of largest magnitude coefficients of x p = 1 or p = 2 Minimize number r = r(n, k, ε) of measurements Pr A [ ] > 2/3

Previous Work p = 1 [IR, …] r = O(k log(n/k) / ε) (deterministic A) p = 2 [GLPS] r = O(k log(n/k) / ε) In both cases, r = (k log(n/k)) [DIPW] What is the dependence on ε?

Why 1+ε is Important Suppose x = e i + u –e i = (0, 0, …, 0, 1, 0, …, 0) –u is a random unit vector orthogonal to e i Consider y = 0 n –|x-y| 2 = |x| 2 · 2 1/2 ¢ |x-e i | 2 Its a trivial solution! (1+ε)-approximate recovery fixes this In some applications, can have 1/ε = 100, log n = 32

Our Results Vs. Previous Work p = 1 [IR, …] r = O(k log(n/k) / ε) r = O(k log(n/k) ¢ log 2 (1/ ε) / ε 1/2 ) (randomized) r = (k log(1/ε) / ε 1/2 ) p = 2: [GLPS] r = O(k log(n/k) / ε) r = (k log(n/k) / ε) Previous lower bounds (k log(n/k)) Lower bounds for randomized constant probability

Comparison to Deterministic Schemes We get r = O~(k/ε 1/2 ) randomized upper bound for p = 1 We show (k log (n/k) /ε) for p = 1 for deterministic schemes So randomized easier than deterministic

Our Sparse-Output Results Output a vector y from Ax so that |x-y| p · (1+ε) |x-x top k | p Sometimes want y to be k-sparse r = (k/ε p ) Both results tight up to logarithmic factors Recall that for non-sparse output r = £ ~(k/ε p/2 )

Talk Outline 1.O~(k / ε 1/2 ) upper bound for p = 1 2.Lower bounds

Simplifications Want O~(k/ε 1/2 ) for p = 1 Replace k with 1 –Sample 1/k fraction of coordinates –Solve the problem for k = 1 on the sample –Repeat O~(k) times independently –Combine the solutions found ε/k, ε/k, …, ε/k, 1/n, 1/n, …, 1/n ε/k, 1/n, …, 1/n

k = 1 Assume |x-x top | 1 = 1, and x top = ε First attempt –Use CountMin [CM] –Randomly partition coordinates into B buckets, maintain sum in each bucket Σ i s.t. h(i) = 2 x i The expected l 1 -mass of noise in a bucket is 1/B If B = £ (1/ε), most buckets have count ε/2 Repeat O(log n) times

Second Attempt But we wanted O~(1/ε 1/2 ) measurements Error in a bucket is 1/B, need B ¼ 1/ε What about CountSketch? [CCF-C] –Give each coordinate i a random ¾ (i) 2 {-1,1} –Randomly partition coordinates into B buckets, maintain Σ i s.t. h(i) = j ¾ (i) ¢ x i in j-th bucket –Bucket error is (Σ i top x i 2 / B) 1/2 –Is this better? Σ i s.t. h(i) = 2 ¾ (i) ¢ x i

CountSketch Bucket error Err = (Σ i top x i 2 / B) 1/2 All |x i | · ε and |x-x top | 1 = 1 Σ i top x i 2 · 1/ ε ¢ ε 2 · ε So Err · (ε/B) 1/2 which needs to be at most ε Solving, B ¸ 1/ ε CountSketch isnt better than CountMin

Main Idea We insist on using CountSketch with B = 1/ε 1/2 Suppose Err = (Σ i top x i 2 / B) 1/2 = ε This means Σ i top x i 2 = ε 3/2 Forget about x top ! Lets make up the mass another way

Main Idea We have: Σ i top x i 2 = ε 3/2 Intuition: suppose all x i, i top, are the same or 0 Then: (# non-zero)*value = 1 (# non-zero)*value 2 = ε 3/2 Hence, value = ε 3/2 and # non-zero = 1/ε 3/2 Sample ε-fraction of coordinates uniformly at random! –value = ε 3/2 and # non-zero sampled = 1/ε 1/2, so l 1 -contribution = ε –Find all non-zeros with O~(1/ε 1/2 ) measurements

General Setting Σ i top x i 2 = ε 3/2 S j = {i | 1/4 j < x i 2 · 1/4 j-1 } Σ i top x i 2 = ε 3/2 implies there is a j for which |S j |/4 j = ~(ε 3/2 ) ε 3/2, …, ε 3/2 4ε 3/2, …, 4ε 3/2 16ε 3/2, …, 16ε 3/2 … ε 3/4

General Setting If |S j | ε 2, so 1/2 j > ε, cant happen Else, sample at rate 1/(|S j | ε 1/2 ) to get 1/ε 1/2 elements of |S j | l 1 -mass of |S j | in sample is > ε Can we find the sampled elements of S j ? Use Σ i top x i 2 = ε 3/2 The l 2 2 of the sample is about ε 3/2 ¢ 1/(|S j | ε 1/2 ) = ε/|S j | Using CountSketch with 1/ε 1/2 buckets: Bucket error = sqrt{ε 1/2 ¢ ε 3/2 ¢ 1/(|S j | ε 1/2 )} = sqrt{ε 3/2 /|S j |} ε 3/2

Algorithm Wrapup Sub-sample O(log 1/ε) times in powers of 2 In each level of sub-sampling maintain CountSketch with O~(1/ε 1/2 ) buckets Find as many heavy coordinates as you can! Intuition: if CountSketch fails, there are many heavy elements that can be found by sub-sampling Wouldnt work for CountMin as bucket error could be ε because of n-1 items each of value ε/(n-1)

Talk Outline 1.O~(k / ε 1/2 ) upper bound for p = 1 2.Lower bounds

Our Results General results: – ~(k / ε 1/2 ) for p = 1 – (k log(n/k) / ε) for p = 2 Sparse output: – ~(k/ε) for p = 1 – ~(k/ε 2 ) for p = 2 Deterministic: – (k log(n/k) / ε) for p = 1

Simultaneous Communication Complexity Alice Bob x Alice and Bob send a single message to the referee who outputs f(x,y) with constant probability Communication cost CC(f) is maximum message length, over randomness of protocol and all possible inputs Parties share randomness What is f(x,y)? y M A (x) M B (y)

Shared randomness decides matrix A Alice sends Ax to referee Bob sends Ay to referee Referee computes A(x+y), uses compressed sensing recovery algorithm If output of algorithm solves f(x,y), then # rows of A * # bits per measurement > CC(f) Reduction to Compressed Sensing

A Unified View General results: Direct-Sum Gap-l 1 – ~(k / ε 1/2 ) for p = 1 – ~(k / ε) for p = 2 Sparse output: Indexing – ~(k/ε) for p = 1 – ~(k/ε 2 ) for p = 2 Deterministic: Equality – (k log(n/k) / ε) for p = 1 Tighter log factors achievable by looking at Gaussian channels

General Results: k = 1, p = 1 Alice and Bob have x, y, respectively, in R m There is a unique i * for which (x+y) i* = d For all j i *, (x+y) j 2 {0, c, -c}, where |c| < |d| Finding i * requires (m/(d/c) 2 ) communication [SS, BJKS] m = 1/ε 3/2, c = ε 3/2, d = ε Need (1/ε 1/2 ) communication

General Results: k = 1, p = 1 But the compressed sensing algorithm doesnt need to find i * If not then it needs to transmit a lot of information about the tail –Tail a random low-weight vector in {0, ε 3/2, - ε 3/2 } 1/ε 3 –Uses distributional lower bound and RS codes Send a vector y within 1-ε of tail in l 1 -norm Needs 1/ε 1/2 communication

General Results: k = 1, p = 2 Same argument, different parameters (1/ε) communication What about general k?

Handling General k Bounded Round Direct Sum Theorem [BR] (with slight modification) given k copies of a function f, with input pairs independently drawn from ¹, solving a 2/3 fraction needs communication (k ¢ CC ¹ (f)) ε 3/2, …, ε 3/2 ε1/2ε1/2 ε1/2ε1/2 … ε1/2ε1/2 } k Instance for p = 1

Handling General k CC = (k/ε 1/2 ) for p = 1 CC = (k/ε) for p = 2 What is implied about compressed sensing?

Rounding Matrices [DIPW] A is a matrix of real numbers Can assume orthonormal rows Round the entries of A to O(log n) bits, obtaining matrix A Careful –Ax = A(x+s) for small s –But s depends on A, no guarantee recovery works –Can be fixed by looking at A(x+s+u) for random u

Lower Bounds for Compressed Sensing # rows of A * # bits per measurement > CC(f) By rounding, # bits per measurement = O(log n) In our hard instances, universe size = poly(k/ε) So # rows of A * O(log (k/ε)) > CC(f) # rows of A = ~(k/ε 1/2 ) for p = 1 # rows of A = ~(k/ε) for p = 2

Sparse-Output Results Sparse output: Indexing – ~(k/ε) for p = 1 – ~(k/ε 2 ) for p = 2

Sparse Output Results - Indexing x 2 {0,1} n i 2 {1, 2, …, n} What is x i ? CC(Indexing) = (n)

(1/ε) Bound for k=1, p = 1 x 2 {- ε, ε} 1/ε y = e i Consider x+y If output is required to be 1-sparse must place mass on the i-th coordinate Mass must be 1+ε if x i = ε, otherwise 1-ε Generalizes to k > 1 to give ~(k/ε) Generalizes to p = 2 to give ~(k/ε 2 ) Generalizes to k > 1 to give ~(k/ε) Generalizes to p = 2 to give ~(k/ε 2 )

Deterministic Results Deterministic: Equality – (k log(n/k) / ε) for p = 1

Deterministic Results - Equality x 2 {0,1} n Is x = y? Deterministic CC(Equality) = (n) y 2 {0,1} n

(k log(n/k) / ε) for p = 1 Choose log n signals x 1, …, x log n, each with k/ε values equal to ε/k x = Σ i=1 log n 10 i x i Choose log n signals y 1, …, y log n, each with k/ε values equal to ε/k y = Σ i=1 log n 10 i y i Consider x-y Compressed sensing output is 0 n iff x = y

General Results – Gaussian Channels (k = 1, p = 2) Alice has a signal x =ε 1/2 e i for random i 2 [n] Alice transmits x over a noisy channel with independent N(0, 1/n) noise on each coordinate Consider any row vector a of A Channel output = +, where is N(0, |a| 2 2 /n) E i [ 2 ] = ε |a| 2 2 /n Shannon-Hartley Theorem: I(i; + ) = I( ; + ) · ½ log(1+ ε) = O(ε)

Summary of Results General results – £ ~(k/ε p/2 ) Sparse output – £ ~(k/ε p ) Deterministic – £ (k log(n/k) / ε) for p = 1