Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)
A Sketching Problem similar? To be or not to be To sketch or not to sketch beto similar?
Sketch from LSH 3 1 [Broder’97]: for Jaccard coefficient
General Theory: embeddings Euclidean distance ( ℓ 2 ) Hamming distance Edit distance between two strings Earth-Mover (transportation) Distance Compute distance between two points Diameter/Close-pair of set S Clustering, MST, etc Nearest Neighbor Search f Reduce problem to
Embeddings: landscape
Dimension Reduction
Main intuition
1D embedding
22
Full Dimension Reduction
Concentration
Dimension Reduction: wrap-up
NNS for Euclidean space 13 [Datar-Immorlica-Indyk-Mirrokni’04]
Regular grid → grid of balls p can hit empty space, so take more such grids until p is in a ball Need (too) many grids of balls Start by projecting in dimension t Analysis gives Choice of reduced dimension t? Tradeoff between # hash tables, n , and Time to hash, t O(t) Total query time: dn 1/c 2 +o(1) Near-Optimal LSH 2D p p RtRt [A-Indyk’06]
Open question: [Prob. needle of length 1 is not cut] [Prob needle of length c is not cut] ≥ c2c2
Time-Space Trade-offs [AI’06] [KOR’98, IM’98, Pan’06] [Ind’01, Pan’06] SpaceTimeCommentReference [DIIM’04, AI’06] [IM’98] query time space medium low high low one hash table lookup! n o(1/ε 2 ) ω(1) memory lookups [AIP’06] n 1+o(1/c 2 ) ω(1) memory lookups [PTW’08, PTW’10]
NNS beyond LSH 17
Finale