Download presentation
Presentation is loading. Please wait.
1
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
2
Dimension Reduction Algorithmic metric embedding technique (R d, L q ) ! (R k, L p ) k << d Useful in algorithms requiring exponential (in d) time/space Johnson-Lindenstrauss for L 2 What is exact complexity?
3
Dimension Reduction Applications Approximate nearest neighbor [KOR00, IM98]… Text analysis [PRTV98] Clustering [BOR99, S00] Streaming [I00] Linear algebra [DKM05, DKM06] Matrix multiplication Matrix multiplication SVD computation SVD computation L 2 regression L 2 regression VLSI layout Design [V98] Learning [AV99, D99, V98]...
4
Three Quick Slides on: Approximate Nearest Neighbor Searching...
5
Approximate Nearest Neighbor P = Set of n points x p min p dist(x,p) · (1+ )dist(x,p min )
6
Approximate Nearest Neighbor d can be very large -approx beats “curse of dimensionality” [IM98, H01] (Euclidean), [KOR00] (Cube): Time O( -2 d log n) Time O( -2 d log n) Space n O( -2 ) Space n O( -2 ) Bottleneck: Dimension reduction Using FJLT O(d log d + -3 log 2 n)
7
The d-Hypercube Case [KOR00] Binary search on distance 2 [d] For distance multiply space by random matrix 2 Z 2 k £ d k=O( - log n) ij i.i.d. » biased coin Preprocess lookup tables for x (mod 2) Our observation: can be made sparse Using “handle” to p2 P s.t. dist(x,p) Using “handle” to p2 P s.t. dist(x,p) Time for each step: O( -2 d log n) ) O(d + -2 log n) How to make similar improvement for L 2 ?
8
Back to Euclidean Space and Johnson-Lindenstrauss...
9
History of Johnson-Lindenstrauss Dimension Reduction [JL84] : Projection of R d onto random subspace of dimension k=c -2 log n w.h.p.: 8 p i,p j 2 P || p i - p j || 2 = (1±O( ) ||p i - p j || 2 L 2 ! L 2 embedding
10
History of Johnson-Lindenstrauss Dimension Reduction [FM87], [DG99] Simplified proof, improved constant c 2 R k £ d : random orthogonal matrix 11 22 kk || i || 2 =1 i ¢ j = 0
11
History of Johnson-Lindenstrauss Dimension Reduction [IM98] 2 R k£ d : ij i.i.d. » N(0,1/d) 11 22 kk E || i || 2 2 =1 E i ¢ j = 0
12
History of Johnson-Lindenstrauss Dimension Reduction [A03] Need only tight concentration of | i ¢ v| 2 2 R k£ d : ij i.i.d. » 11 22 kk E || i || 2 2 =1 E i ¢ j = 0 +1 1/2 -1 1/2
13
History of Johnson-Lindenstrauss Dimension Reduction [A03] 2 R k£ d : ij i.i.d. » Sparse 11 22 kk E || i || 2 2 =1 E i ¢ j = 0 +1 1/6 0 2/3 -1 1/6
14
0000000000000000000000000000000000000000000000 Sparse Johnson-Lindenstrauss Sparsity parameter: s = Pr[ ij 0 ] Cannot be o(1) due to “hidden coordinate” 01000100 v = 2 R d 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
15
Uncertainty Principle v sparse ) v dense v = H v ^ ^ - Walsh - Hadamard matrix - Fourier transform on {0,1} log 2 d - Computable in time O(d log d) - Isometry: ||v|| 2 = ||v|| 2 ^
16
Adding Randomization H deterministic, invertible ) We’re back to square one! Precondition H with random diagonal D ±1 ±1... D = - Computable in time O(d) - Isometry
17
The l 1 -Bound Lemma w.h.p.: 8 p i,p j 2 P µ R d : ||HD(p i - p j )|| 1 · O(d -1/2 log 1/2 n) ||p i - p j || 2 Rules out: HD(p i – p j ) = “hidden coordinate vector” !! instead... instead...
18
Hidden Coordinate-Set Worst-case v = p i - p j (assuming l 1 -bound): 8 j J: |v j | = (d -1/2 log 1/2 n) 8 j J: v j = 0 J µ [d], |J| = (d/log n) (assume ||v|| 2 = 1)
19
Fast J-L Transform FJLT = H D ij i.i.d » 0 1-s N(0,1) s Diag(±1) Hadamard Sparse JL l 2 ! l 1 l 2 ! l 2 -1 log n d log 2 n d s Bottleneck: Bias of | i ¢ v| Bottleneck: Variance of | i ¢ v| 2
20
Applications Approximate nearest neighbor in (R d, l 2 ) l 2 regression: minimize ||Ax-b|| 2 A 2 R n £ d over-constrained: d<<n [DMM06] approximate by sampling [Sarlos06] using FJLT ) constructive More applications...? non-constructive
21
Interesting Problem I Improvement & lower bound for J-L computation
22
Interesting Problem II Dimension reduction is sampling Sampling by random walk: Expander graphs for uniform sampling Expander graphs for uniform sampling Convex bodies for volume estimation Convex bodies for volume estimation [Kac59]: Random walk on orthogonal group for t=1..T: pick i,j 2 R [d], 2 R [0,2 ) v i v i cos + v j sin v j -v i sin + v j cos Output (v 1,..., v k ) as dimension reduction of v How many steps for J-L guarantee? [CCL01], [DS00], [P99]... Ã Thank You !
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.