Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)

Slides:



Advertisements
Similar presentations
1. Find the cost of each of the following using the Nearest Neighbor Algorithm. a)Start at Vertex M.
Advertisements

1 Approximating Edit Distance in Near-Linear Time Alexandr Andoni (MIT) Joint work with Krzysztof Onak (MIT)
Nearest Neighbor Search
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
Spectral Approaches to Nearest Neighbor Search arXiv: Robert Krauthgamer (Weizmann Institute) Joint with: Amirali Abdullah, Alexandr Andoni, Ravi.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.
Voronoi Diagrams in n· 2 O(√lglg n ) Time Timothy M. ChanMihai Pătraşcu STOC’07.
High Dimensional Search Min-Hashing Locality Sensitive Hashing
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Dimension Reduction in the Hamming Cube (and its Applications) Rafail Ostrovsky UCLA (joint works with Rabani; and Kushilevitz and Rabani)
Similarity Search in High Dimensions via Hashing
Metric Embeddings As Computational Primitives Robert Krauthgamer Weizmann Institute of Science [Based on joint work with Alex Andoni]
Parallel Algorithms for Geometric Graph Problems Alex Andoni (Microsoft Research) Joint with: Aleksandar Nikolov (Rutgers), Krzysztof Onak (IBM), Grigory.
Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)
VLSH: Voronoi-based Locality Sensitive Hashing Sung-eui Yoon Authors: Lin Loi, Jae-Pil Heo, Junghwan Lee, and Sung-Eui Yoon KAIST
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)
Spectral Approaches to Nearest Neighbor Search Alex Andoni Joint work with:Amirali Abdullah Ravi Kannan Robi Krauthgamer.
Large-scale matching CSE P 576 Larry Zitnick
Nearest Neighbor Search in High-Dimensional Spaces Alexandr Andoni (Microsoft Research Silicon Valley)
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
1 Lecture 18 Syntactic Web Clustering CS
Topological Data Analysis MATH 800 Fall Topological Data Analysis (TDA) An ε-chain is a finite sequence of points x 1,..., x n such that |x i –
Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.
Efficient Nearest-Neighbor Search in Large Sets of Protein Conformations Fabian Schwarzer Itay Lotan.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)
Optimal Data-Dependent Hashing for Approximate Near Neighbors
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.
Embedding and Sketching Non-normed spaces Alexandr Andoni (MSR)
Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.
Algorithms on negatively curved spaces James R. Lee University of Washington Robert Krauthgamer IBM Research (Almaden) TexPoint fonts used in EMF. Read.
Sketching, Sampling and other Sublinear Algorithms: Algorithms for parallel models Alex Andoni (MSR SVC)
Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Princeton/CCI → MSR SVC) Barriers II August 30, 2010.
NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1.
Sketching and Nearest Neighbor Search (2) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
1 Efficient Algorithms for Substring Near Neighbor Problem Alexandr Andoni Piotr Indyk MIT.
Outline Problem Background Theory Extending to NLP and Experiment
11 Algorithmic Techniques for Massive Data (COMS ) Alex Andoni.
String Matching By Joshua Yudaken. Terms Haystack A string in which to search Needle The string being searched for  find the needle in the haystack.
11 Lecture 24: MapReduce Algorithms Wrap-up. Admin PS2-4 solutions Project presentations next week – 20min presentation/team – 10 teams => 3 days – 3.
Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.
Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)
S IMILARITY E STIMATION T ECHNIQUES FROM R OUNDING A LGORITHMS Paper Review Jieun Lee Moses S. Charikar Princeton University Advanced Database.
Sublinear Algorithmic Tools 3
Lecture 11: Nearest Neighbor Search
K Nearest Neighbor Classification
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Lecture 7: Dynamic sampling Dimension Reduction
Near(est) Neighbor in High Dimensions
Data-Dependent Hashing for Nearest Neighbor Search
CIS 700: “algorithms for Big Data”
Near-Optimal (Euclidean) Metric Compression
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Locality Sensitive Hashing
Overcoming the L1 Non-Embeddability Barrier
CSCI B609: “Foundations of Data Science”
Lecture 15: Least Square Regression Metric Embeddings
Minwise Hashing and Efficient Search
President’s Day Lecture: Advanced Nearest Neighbor Search
Ronen Basri Tal Hassner Lihi Zelnik-Manor Weizmann Institute Caltech
Presentation transcript:

Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)

A Sketching Problem similar? To be or not to be To sketch or not to sketch beto similar?

Sketch from LSH 3 1 [Broder’97]: for Jaccard coefficient

General Theory: embeddings Euclidean distance ( ℓ 2 ) Hamming distance Edit distance between two strings Earth-Mover (transportation) Distance Compute distance between two points Diameter/Close-pair of set S Clustering, MST, etc Nearest Neighbor Search f Reduce problem to

Embeddings: landscape

Dimension Reduction

Main intuition

1D embedding

22

Full Dimension Reduction

Concentration

Dimension Reduction: wrap-up

NNS for Euclidean space 13 [Datar-Immorlica-Indyk-Mirrokni’04]

 Regular grid → grid of balls  p can hit empty space, so take more such grids until p is in a ball  Need (too) many grids of balls  Start by projecting in dimension t  Analysis gives  Choice of reduced dimension t?  Tradeoff between  # hash tables, n , and  Time to hash, t O(t)  Total query time: dn 1/c 2 +o(1) Near-Optimal LSH 2D p p RtRt [A-Indyk’06]

Open question: [Prob. needle of length 1 is not cut] [Prob needle of length c is not cut] ≥ c2c2

Time-Space Trade-offs [AI’06] [KOR’98, IM’98, Pan’06] [Ind’01, Pan’06] SpaceTimeCommentReference [DIIM’04, AI’06] [IM’98] query time space medium low high low one hash table lookup! n o(1/ε 2 ) ω(1) memory lookups [AIP’06] n 1+o(1/c 2 ) ω(1) memory lookups [PTW’08, PTW’10]

NNS beyond LSH 17

Finale