Fast Johnson-Lindenstrauss Transform(s) Nir Ailon Edo Liberty, Bernard Chazelle Bertinoro Workshop on Sublinear Algorithms May 2011.


Similar presentations
Polylogarithmic Private Approximations and Efficient Matching

Estimating Distinct Elements, Optimally
Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Numerical Linear Algebra in the Streaming Model
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Sublinear Algorithms … Lecture 23: April 20.
On the Amortized Complexity of Zero-Knowledge Proofs Ronald Cramer, CWI Ivan Damgård, Århus University.
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
1/17 Deterministic Discrepancy Minimization Nikhil Bansal (TU Eindhoven) Joel Spencer (NYU)
Online Performance Guarantees for Sparse Recovery Raja Giryes ICASSP 2011 Volkan Cevher.
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
ECE Department Rice University Measurements and Bits: Compressed Sensing meets Information Theory Shriram Sarvotham Dror Baron Richard.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
“Random Projections on Smooth Manifolds” -A short summary
Randomized matrix algorithms and their applications
Department of Computer Science & Engineering University of Washington
Probably Approximately Correct Model (PAC)
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Random Convolution in Compressive Sampling Michael Fleyer.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.
Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.
Compressed Sensing Compressive Sampling
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Topics in Algorithms 2007 Ramesh Hariharan. Random Projections.
AMSC 6631 Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images: Midyear Report Alfredo Nava-Tudela John J. Benedetto,
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Kernels, Margins, and Low-dimensional Mappings [NIPS 2007 Workshop on TOPOLOGY LEARNING ] Maria-Florina Balcan, Avrim Blum, Santosh Vempala.
Quantum Computing MAS 725 Hartmut Klauck NTU
CS717 Algorithm-Based Fault Tolerance Matrix Multiplication Greg Bronevetsky.
Dimension Reduction using Rademacher Series on Dual BCH Codes Nir Ailon Edo Liberty.
Quantum Computing MAS 725 Hartmut Klauck NTU
Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)
Beating CountSketch for Heavy Hitters in Insertion Streams Vladimir Braverman (JHU) Stephen R. Chestnut (ETH) Nikita Ivkin (JHU) David P. Woodruff (IBM)
New Algorithms for Heavy Hitters in Data Streams David Woodruff IBM Almaden Joint works with Arnab Bhattacharyya, Vladimir Braverman, Stephen R. Chestnut,
An Optimal Algorithm for Finding Heavy Hitters
Random Access Codes and a Hypercontractive Inequality for
New Characterizations in Turnstile Streams with Applications
Lecture 13 Compressive sensing
Highly Undersampled 0-norm Reconstruction
Lecture 22: Linearity Testing Sparse Fourier Transform
Fast Dimension Reduction MMDS 2008
Estimating L2 Norm MIT Piotr Indyk.
Lecture 15 Sparse Recovery Using Sparse Matrices
Sublinear Algorithmic Tools 2
Sketching and Embedding are Equivalent for Norms
Lecture 4: CountSketch High Frequencies
Nuclear Norm Heuristic for Rank Minimization
Y. Kotidis, S. Muthukrishnan,
The Curve Merger (Dvir & Widgerson, 2008)
Probabilistic existence of regular combinatorial objects
K-wise vs almost K-wise permutations, and general group actions
Linear sketching over
Bounds for Optimal Compressed Sensing Matrices
Sudocodes Fast measurement and reconstruction of sparse signals
Linear sketching with parities
CSCI B609: “Foundations of Data Science”
On Approximating Covering Integer Programs
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
CIS 700: “algorithms for Big Data”
Sublinear Algorihms for Big Data
Presentation transcript:

Fast Johnson-Lindenstrauss Transform(s) Nir Ailon Edo Liberty, Bernard Chazelle Bertinoro Workshop on Sublinear Algorithms May 2011

JL – Distribution Version Find random mapping from R n to R k (n big, k small) such that for every x R n, ǁxǁ 2 =1 with probability exp{-k || x|| 2 = 1 ± O( (0< K is Tight for this probabilistic guarantee [Jayram, Woodruff 2011]

JL – Metric Embedding Version If you have N vectors x 1..x N R n : set k=O( log N) by union bound: for all i,j ǁ x i - x j ǁ ǁx i - x j ǁ low-distortion metric embedding Target dimension k almost tight [Alon 2003]

Solution: Johnson-Lindenstrauss (JL) dense random matrix k n =

So whats the problem? Running time (kn) Number of random bits (kn) Can we do better?

Fast JL A, Chazelle 2006 = S parse. H adamard. D iagonal Time = O(k 3 + nlog n), Randomness=O(k 3 log n + n) beats JL (kn) bound for: log n < k < n 1/2 k n Fourier

Improvement Ailon, Liberty 2008 O(n log n) for k < n 1/2 O(n) random bits

Algorithm (works for k=O(d 1/2 )) A, Liberty 2007 = B. D 1. H. D 2. H. D 3 … B = n Error Correcting Code kD i =

Assume D 1 =diag( 1 … d ) BD 1 x = x i B (i) i Rademacher r.v. in k dim Tail of Z=||BD 1 x|| 2 bounded using Talagrand Pr[|Z- | > ] exp{- 2 / 2 } ||B diag(x)|| 2 ||B t || 2 4 ||x|| 4 (Cauchy Schwartz) Algorithm (works for k=O(d 1/2 )) A, Liberty 2007 = B. D 1. H. D 2. H. D 3 … B = n Error Correcting Code k Each element is 1/ k Row set is subset of rows from Hadamard O(n log n) runtime Columns are 4-wise independent ||B t || 2 4 = O(1) Best we could hope for: ||x|| 4 =d -1/4 =k -1/2 O(1)k -1/2 k HD i x – Rademacher r.v. in d dim Z = ||HD i x|| 4 bounded using Talagrand… Pr[|Z- | > ] exp{- 2 / 2 } ||H|| 4/3 4 ||x|| 4 (Cauchy Schwartz) Use Haussdorff-Young and assumption on k to make progress at each i k=d 1/2-

In the meantime… Compressed sensing for sparse signal recovery Find a k n mapping s.t. the equation: y =. x, could be efficiently solved exactly for s-sparse x R.I.P. property sufficient (Candes + Tao): ǁ. xǁ ǁxǁ for s-sparse x You also want to be efficiently computable for the recovery algorithm

Why J.L. R.I.P. Number of s-sparse xs is exp{ s log n} [Baraniuk] Therefore k s log n/ 2 measurements enough using distributional JL to get (1+ )-R.I.P. for s-sparse vectors But fast R.I.P. was known without restriction on k – Rudelson, Vershynin: Take k log 3 n randomly chosen rows from Fourier transform – No restriction of the form k < n 1/2 Does R.I.P. J.L. ? – That would be a good way to kill restriction on k!

Rudelson + Vershynins R.I.P. If is random choice of k=s t log 4 n rows from Fourier (Hadamard) matrix, then with constant probability matrix is (1/ t)-R.I.P for s-sparse vectors

Rudelson + Vershynins R.I.P. (almost) metic J.L. Analysis not black box Had to extend nitty-gritty details

The Transformation Hadamard (Unnormalized) 1 n k k = O(log 4 n log N / 4 ) with no restriction on k x The Analysis D n n s = log N / 2 heaviest coordinates n – s lightest coordinates: bounded by 1/ s = 1/( log N) k -1/2

Hadamard (Unnormalized) 1 n k k = O(log 4 n log N / 4 ) with no restriction on k x The Analysis D n n s = log N / 2 heaviest coordinates n – s lightest coordinates: bounded by 1/ s = 1/( log N) k -1/2 || Dx H || 2 = ||x H || 2 ( 1 O( )) directly from r.v. xHxH xLxL D x L Rademacher Z = || k -1/2 D x L || 2 concentrated using Talagrand with = || k -1/2 diag(x L )|| 2 2

k = O(log 4 n log N / 4 ) with no restriction on k The Analysis s = log N / 2 heaviest coordinates n – s lightest coordinates: bounded by 1/ s = 1/( log N) || Dx H || 2 = ||x H || 2 ( 1 O( )) directly from r.v. D x L Rademacher Z = || k -1/2 D x L || 2 concentrated using Talagrand with = || k -1/2 diag(x L )|| 2 2 R.V. Proved that with constant probability over, uniformly for all vectors x=s ones and the rest zero: ||k -1/2 diag(x)|| 2 2 is bounded (this is R.I.P.). The two parameters that govern the bound are: 1.||x|| = 1, and 2.For any vector y s.t. ||y|| 2 =1: ||diag(x) y|| 1 T What we have is 1.||x L || 1/( log N) 2.For any vector y s.t. ||y|| 2 =1: ||diag(x L ) y|| 1 1 = O(log N / 2 )

More.. Krahmer and Ward (2010) prove RIP JL black-box! This fixed the -4 problem and replaces it with the correct -2 !! Proof technique: Kane, Nelson: sparse JL Lots of work on derandomization Can we get rid of polylog n? If we go via R.I.P. then we need at least one log n factor, which JL doesnt seem to need.