Fast Dimension Reduction MMDS 2008

Slides:



Advertisements
Similar presentations
The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.
Advertisements

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Numerical Linear Algebra in the Streaming Model
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Fast Johnson-Lindenstrauss Transform(s) Nir Ailon Edo Liberty, Bernard Chazelle Bertinoro Workshop on Sublinear Algorithms May 2011.
Property Testing of Data Dimensionality Robert Krauthgamer ICSI and UC Berkeley Joint work with Ori Sasson (Hebrew U.)
A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.
1 Approximating Edit Distance in Near-Linear Time Alexandr Andoni (MIT) Joint work with Krzysztof Onak (MIT)
3D Geometry for Computer Graphics
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
Dimensionality Reduction For k-means Clustering and Low Rank Approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu.
Uniform Sampling for Matrix Approximation Michael Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford M.I.T.
1/17 Deterministic Discrepancy Minimization Nikhil Bansal (TU Eindhoven) Joel Spencer (NYU)
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT)
Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research (Joint work with P. Drineas.
Computer Graphics Recitation 5.
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Efficient Nearest-Neighbor Search in Large Sets of Protein Conformations Fabian Schwarzer Itay Lotan.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.
Summarized by Soo-Jin Kim
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Orthogonality and Least Squares
Statistical Leverage and Improved Matrix Algorithms Michael W. Mahoney Yahoo Research ( For more info, see: )
CS717 Algorithm-Based Fault Tolerance Matrix Multiplication Greg Bronevetsky.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
Dimension Reduction using Rademacher Series on Dual BCH Codes Nir Ailon Edo Liberty.
Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Columbia) Robert Krauthgamer (Weizmann Inst) Ilya Razenshteyn (MIT) 1.
Iterative Row Sampling Richard Peng Joint work with Mu Li (CMU) and Gary Miller (CMU) CMU  MIT.
Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)
Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.
A Story of Principal Component Analysis in the Distributed Model David Woodruff IBM Almaden Based on works with Christos Boutsidis, Ken Clarkson, Ravi.
Spectral Methods for Dimensionality
Random Access Codes and a Hypercontractive Inequality for
Support Vector Machines (SVM)
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Information Complexity Lower Bounds
New Characterizations in Turnstile Streams with Applications
Dimension reduction for finite trees in L1
Approximating the MST Weight in Sublinear Time
Estimating L2 Norm MIT Piotr Indyk.
Sublinear Algorithmic Tools 3
Sublinear Algorithmic Tools 2
LSI, SVD and Data Management
Dimension reduction techniques for lp (1<p<2), with applications
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Near(est) Neighbor in High Dimensions
Data-Dependent Hashing for Nearest Neighbor Search
CIS 700: “algorithms for Big Data”
Linear sketching with parities
Y. Kotidis, S. Muthukrishnan,
Near-Optimal (Euclidean) Metric Compression
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Overview Massive data sets Streaming algorithms Regression
Linear sketching over
CSCI B609: “Foundations of Data Science”
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Lecture 15: Least Square Regression Metric Embeddings
Ronen Basri Tal Hassner Lihi Zelnik-Manor Weizmann Institute Caltech
Sublinear Algorihms for Big Data
Learning-Based Low-Rank Approximations
Presentation transcript:

Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY  “Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes” (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard Chazelle)

Original Motivation: Nearest Neighbor Searching Wanted to improve algorithm by Indyk, Motwani for approx. NN searching in Euclidean space. Evidence for possibility to do so came from improvement on algorithm by Kushilevitz, Ostrovsky, Rabani for approx NN searching over GF(2). If we were to do the same for Euclidean space, it was evident that improving run time of Johnson-Lindenstrauss was key.

Later Motivation Provide more elegant proof, use modern techniques. Improvement obtained as bonus. Exciting use of Talagrand concentration bounds Error correcting codes

Random Dimension Reduction Sketching [Woodruff, Jayram, Li] (Existential) metric embedding Distance preserving Sets of points, subspaces, manifolds [Clarkson] Volume preserving [Magen, Zouzias] Fast approximate linear algebra SVD, linear regression (Muthukrishnan, Mahoney, Drineas, Sarlos) Computational aspects: Time [A, Chazelle, Liberty] + {Sketching Community} + {Fast Approximate Linear Algebra Community} randomness {Functional Analysis community}

Theoretical Challenge Find random projection  from Rd to Rk (d big, k small) such that for every x  Rd, ||x||2=1, 0< with probability 1-exp{-k ||x||2 = 1 ± O( Rd Rk  x x

Usage If you have n vectors x1..xn  Rd: set k=O(log n) by union bound: for all i,j ||xi- xj|| ≈||xi- xj|| low-distortion metric embedding “tight” Rd Rk  x1 x1 x2 x3 x2 x3

Solution: Johnson-Lindenstrauss (JL) “dense random matrix”  = k d

So what’s the problem? running time (kd) number of random bits (kd) can we do better?

Fast JL A, Chazelle 2006 = Sparse . Hadamard . Diagonal time = O(k3 + dlog d) beats JL (kd) bound for: log d < k < d1/3 k Fourier d

Improvement on FJLT O(d logk) for k < d1/2 beats JL up to k < d1/2 O(d) random bits

Algorithm (k=d1/2) A, Liberty 2007 = BCH . Diagonal . .... Error Correcting Code k d

Error Correcting Codes B = k = d1/2  binary “dual-BCH code of designed distance 4” 0  k-1/2 1  -k-1/2 columns closed under addition row set subset of d x d Hadamard Bx computable in time d logd given x 4-wise independence

Error Correcting Codes Fact (easily from properties of dual BCH): ||Bt||24 = O(1) for yt  Rk with ||y||2 = 1: ||yB||4 = O(1) x x ... Rd D B Rk=d

Rademacher Series on Error Correcting Codes look at r.v. BDx  (Rk, l2) BDx = DiixiBi Dii R{1} i=1...d = DiiMi x x ... Rd D B Rk=d

Talagrand’s Concentration Bound for Rademacher Series Z = ||DiiMi||p (in our case p=2) Pr[ |Z-EZ| >  ] = O(exp{-2/4||M||2p2})

Rademacher Series on Error Correcting Codes look at r.v. BDx  (Rk, l2) Z = ||BDx||2 = ||DiiMi||2 ||M||22||x||4||Bt||24 (Cauchy-Schwartz) by ECC properties: ||M||22||x||4O(1) trivial: EZ = ||x||2 = 1 x x ... Rd D B Rk=d

Rademacher Series on Error Correcting Codes look at r.v. BDx  (Rk, l2) Z = ||BDx||2 = ||DiiMi||2 ||M||22=O(||x||4)  EZ = 1 Pr[ |Z-EZ| >  ] = O(exp{-2/4||M||222})  Pr[|Z-1| > ] = O(exp{-2/||x||42}) how to get ||x||42 = O(k-1=d-1/2) ? x x Challenge: w. prob. exp{-k deviation of more than  ... Rd D B Rk=d

Controlling ||x||42 if you think about it for a second... how to get ||x||42 = O(k-1=d-1/2) ? if you think about it for a second... “random” x has ||x||42=O(d-1/2) but “random” x easy to reduce: just output first k dimensions are we asking for too much? no: truly random x has strong bound on ||x||p for all p>2 x x ... Rd D B Rk=d

Controlling ||x||42 can multiply x by orthogonal matrix try matrix HD how to get ||x||42 = O(k-1=d-1/2) ? can multiply x by orthogonal matrix try matrix HD Z = ||HDx||4 = ||DiixiHi||4 = ||DiiMi||4 by Talagrand: Pr[ |Z-EZ| > t ] = O(exp{-t2/4||M||242}) EZ = O(d-1/4) (trivial) ||M||24||H||4/34||x||4 (Cauchy Schwartz) (HD used in [AC06] to control ||HDx||) x x ... Rd D B Rk=d

Controlling ||x||42 Z = ||DiiMi||4 Mi = xiHi Rd Rk=d B x x D ... Controlling ||x||42 how to get ||x||42 = O(k-1=d-1/2) ? Z = ||DiiMi||4 Mi = xiHi Pr[ |Z-EZ| > t ] = O(exp{-t2/4||M||242}) (Talagrand) EZ = O(d-1/4) (trivial) ||M||24||H||4/34||x||4 (Cauchy Schwartz) ||H||4/34  d-1/4 (Hausdorff-Young) ||M||24d-1/4||x||4 Pr[ ||HDx||4 > d-1/4+ t ] = exp{-t2/d-1/2||x||42}

Rd Rk=d B x x D ... Controlling ||x||42 how to get ||x||42 = O(d-1/2) ? Pr[ ||HDx||4 > d-1/4+ t ] = exp{-t2/d-1/2||x||42} need some slack k=d1/2- max error probability for challenge: exp{-k} k = t2/d-1/2||x||42  t = k1/2d-1/4||x||4 = ||x||4d-/2

Controlling ||x||42 first round: ||HDx||4 < d-1/4 + ||x||4 d-/2 Rd Rk=d B x x D ... Controlling ||x||42 how to get ||x||42 = O(d-1/2) ? first round: ||HDx||4 < d-1/4 + ||x||4 d-/2 second round: ||HD’’HD’x||4 < d-1/4 + d-1/4-/2 + ||x||4 d- r=O(1/)’th round: ||HD(r)...HD’x||4 < O(r)d-1/4 ... with probability 1-O(exp{-k}) by union bound

Algorithm for k=d1/2- x x ... Rd Rk D(1) H D(r) H D B running time O(d logd) randomness O(d)

Open Problems Go beyond k=d1/2 Conjecture: can do O(d log d) for k=d1-  Approximate linear l2-regression minimize ||Ax-b||2 given A,b (overdetermined) State of the art for general inputs: Õ(linear time) for #{variables} < #{equations}1/3 Conjecture: can do Õ(linear time) “always” What is the best you can do in linear time? [A, Liberty, Singer 08]

Open Problem Worthy of Own Slide Prove that JL onto k=d1/3 (say) with distortion 1/4 (say) requires (dlog(d)) time This would establish similar lower bound for FFT Long standing dormant open problem