Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)

Slides:



Advertisements
Similar presentations
Song Intersection by Approximate Nearest Neighbours Michael Casey, Goldsmiths Malcolm Slaney, Yahoo! Inc.
Advertisements

A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Spectral Approaches to Nearest Neighbor Search arXiv: Robert Krauthgamer (Weizmann Institute) Joint with: Amirali Abdullah, Alexandr Andoni, Ravi.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.
Big Data Lecture 6: Locality Sensitive Hashing (LSH)
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Efficiently searching for similar images (Kristen Grauman)
Dimension Reduction in the Hamming Cube (and its Applications) Rafail Ostrovsky UCLA (joint works with Rabani; and Kushilevitz and Rabani)
Similarity Search in High Dimensions via Hashing
Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.
VLSH: Voronoi-based Locality Sensitive Hashing Sung-eui Yoon Authors: Lin Loi, Jae-Pil Heo, Junghwan Lee, and Sung-Eui Yoon KAIST
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT)
Spectral Approaches to Nearest Neighbor Search Alex Andoni Joint work with:Amirali Abdullah Ravi Kannan Robi Krauthgamer.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 11 June 1, 2005
Large-scale matching CSE P 576 Larry Zitnick
Nearest Neighbor Search in High-Dimensional Spaces Alexandr Andoni (Microsoft Research Silicon Valley)
Algorithms for Nearest Neighbor Search Piotr Indyk MIT.
Given by: Erez Eyal Uri Klein Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions.
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
1 Lecture 18 Syntactic Web Clustering CS
Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
1 Edit Distance and Large Data Sets Ziv Bar-Yossef Robert Krauthgamer Ravi Kumar T.S. Jayram IBM Almaden Technion.
Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)
Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)
Optimal Data-Dependent Hashing for Approximate Near Neighbors
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
Efficient Image Search and Retrieval using Compact Binary Codes
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Content Based Image Retrieval Natalia.
Indexing Techniques Mei-Chen Yeh.
Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Princeton/CCI → MSR SVC) Barriers II August 30, 2010.
Sketching and Nearest Neighbor Search (2) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)
1 Efficient Algorithms for Substring Near Neighbor Problem Alexandr Andoni Piotr Indyk MIT.
11 Lecture 24: MapReduce Algorithms Wrap-up. Admin PS2-4 solutions Project presentations next week – 20min presentation/team – 10 teams => 3 days – 3.
Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Columbia) Robert Krauthgamer (Weizmann Inst) Ilya Razenshteyn (MIT) 1.
Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)
S IMILARITY E STIMATION T ECHNIQUES FROM R OUNDING A LGORITHMS Paper Review Jieun Lee Moses S. Charikar Princeton University Advanced Database.
Data-dependent Hashing for Similarity Search
Approximate Near Neighbors for General Symmetric Norms
Fast Dimension Reduction MMDS 2008
Sublinear Algorithmic Tools 3
Lecture 11: Nearest Neighbor Search
Sublinear Algorithmic Tools 2
Spectral Approaches to Nearest Neighbor Search [FOCS 2014]
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Lecture 7: Dynamic sampling Dimension Reduction
Near(est) Neighbor in High Dimensions
Data-Dependent Hashing for Nearest Neighbor Search
Lecture 16: Earth-Mover Distance
Near-Optimal (Euclidean) Metric Compression
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Locality Sensitive Hashing
CS5112: Algorithms and Data Structures for Applications
Lecture 15: Least Square Regression Metric Embeddings
Minwise Hashing and Efficient Search
President’s Day Lecture: Advanced Nearest Neighbor Search
Presentation transcript:

Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)

Prism of nearest neighbor search (NNS) 2 High dimensional geometry dimension reduction space partitions embeddings … NNS ???

Nearest Neighbor Search (NNS)

Motivation  Generic setup:  Points model objects (e.g. images)  Distance models (dis)similarity measure  Application areas:  machine learning: k-NN rule  speech/image/video/music recognition, vector quantization, bioinformatics, etc…  Distance can be:  Hamming,Euclidean, edit distance, Earth-mover distance, etc…  Primitive for other problems:  find the similar pairs in a set D, clustering…

2D case

High-dimensional case AlgorithmQuery timeSpace Full indexing No indexing – linear scan

Approximate NNS c-approximate q r p cr

Heuristic for Exact NNS q r p cr c-approximate

Approximation Algorithms for NNS  A vast literature:  milder dependence on dimension [Arya-Mount’93], [Clarkson’94], [Arya-Mount-Netanyahu-Silverman- We’98], [Kleinberg’97], [Har-Peled’02], [Chan’02]…  little to no dependence on dimension [Indyk-Motwani’98], [Kushilevitz-Ostrovsky-Rabani’98], [Indyk’98, ‘01], [Gionis-Indyk-Motwani’99], [Charikar’02], [Datar-Immorlica- Indyk-Mirrokni’04], [Chakrabarti-Regev’04], [Panigrahy’06], [Ailon- Chazelle’06], [A-Indyk’06], …

Dimension Reduction

Motivation

Dimension Reduction

Idea:

1D embedding

22

Full Dimension Reduction

Concentration

Dimension Reduction: wrap-up

Space Partitions

Locality-Sensitive Hashing q p 1 [Indyk-Motwani’98] q “ not-so-small ”

NNS for Euclidean space 21 [Datar-Immorlica-Indyk-Mirrokni’04]

Putting it all together 22

Analysis of LSH Scheme 23

 Regular grid → grid of balls  p can hit empty space, so take more such grids until p is in a ball  Need (too) many grids of balls  Start by projecting in dimension t  Analysis gives  Choice of reduced dimension t?  Tradeoff between  # hash tables, n , and  Time to hash, t O(t)  Total query time: dn 1/c 2 +o(1) Better LSH ? 2D p p RtRt [A-Indyk’06]

x Proof idea  Claim:, i.e.,  P(r)=probability of collision when ||p-q||=r  Intuitive proof:  Projection approx preserves distances [JL]  P(r) = intersection / union  P(r)≈random point u beyond the dashed line  Fact (high dimensions): the x-coordinate of u has a nearly Gaussian distribution → P(r)  exp(-A·r 2 ) p q r q P(r) u p

Open question: [Prob. needle of length 1 is not cut] [Prob needle of length c is not cut] ≥ 1/c 2

Time-Space Trade-offs [AI’06] [KOR’98, IM’98, Pan’06] [Ind’01, Pan’06] SpaceTimeCommentReference [DIIM’04, AI’06] [IM’98] query time space medium low high low ω(1) memory lookups [AIP’06] ω(1) memory lookups [PTW’08, PTW’10] [MNP’06, OWZ’11] 1 mem lookup

LSH Zoo 28 To be or not to be To Simons or not to Simons …21102… be to or not Simons …01122… be to or not Simons …11101… …01111… {be,not,or,to}{not,or,to, Simons} 1 1 not beto

LSH in the wild 29 safety not guaranteed fewer false positives fewer tables

Space partitions beyond LSH 30

Recap for today 31 High dimensional geometry dimension reduction space partitions embedding … NNS small dimension sketching