Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.

Slides:



Advertisements
Similar presentations
When Is Nearest Neighbors Indexable? Uri Shaft (Oracle Corp.) Raghu Ramakrishnan (UW-Madison)
Advertisements

A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.
A LLISON S EIBERT & A LEXANDRA W ARLEN Efficient Episode Recall and Consolidation E MILIA V ANDERWERF & R OBERT S TILES.
k-Nearest Neighbors Search in High Dimensions
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
Image acquisition using sparse (pseudo)-random matrices Piotr Indyk MIT.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
PARTITIONAL CLUSTERING
Big Data Lecture 6: Locality Sensitive Hashing (LSH)
Searching on Multi-Dimensional Data
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.
Similarity Search in High Dimensions via Hashing
Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.
Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
Large-scale matching CSE P 576 Larry Zitnick
Coherency Sensitive Hashing (CSH) Simon Korman and Shai Avidan Dept. of Electrical Engineering Tel Aviv University ICCV2011 | 13th International Conference.
Computational Support for RRTs David Johnson. Basic Extend.
Given by: Erez Eyal Uri Klein Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions.
CS 361A1 CS 361A (Advanced Data Structures and Algorithms) Lecture 19 (Dec 5, 2005) Nearest Neighbors: Dimensionality Reduction and Locality-Sensitive.
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
Approximate Nearest Subspace Search with Applications to Pattern Recognition Ronen Basri, Tal Hassner, Lihi Zelnik-Manor presented by Andrew Guillory and.
Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Optimal Data-Dependent Hashing for Approximate Near Neighbors
Approximate Nearest Subspace Search with applications to pattern recognition Ronen Basri Tal Hassner Lihi Zelnik-Manor Weizmann Institute Caltech.
FLANN Fast Library for Approximate Nearest Neighbors
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Content Based Image Retrieval Natalia.
Indexing Techniques Mei-Chen Yeh.
Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.
Image Based Positioning System Ankit Gupta Rahul Garg Ryan Kaminsky.
Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Nearest Neighbor Searching Under Uncertainty
Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
1 Efficient Algorithms for Substring Near Neighbor Problem Alexandr Andoni Piotr Indyk MIT.
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.
MotivationLocating the k largest subsequences: Main ideasResults Problem definitions Problem instance ( k=5 ) Bibliography
CS 9633 Machine Learning Support Vector Machines
SIMILARITY SEARCH The Metric Space Approach
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Web Data Integration Using Approximate String Join
Sublinear Algorithmic Tools 3
K Nearest Neighbor Classification
Lecture 10: Sketching S3: Nearest Neighbor Search
Near(est) Neighbor in High Dimensions
Data-Dependent Hashing for Nearest Neighbor Search
Rob Fergus Computer Vision
Lecture 16: Earth-Mover Distance
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
15-826: Multimedia Databases and Data Mining
Locality Sensitive Hashing
cse 521: design and analysis of algorithms
Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm
Nearest Neighbors CSC 576: Data Mining.
CS5112: Algorithms and Data Structures for Applications
Data Mining Classification: Alternative Techniques
Minwise Hashing and Efficient Search
President’s Day Lecture: Advanced Nearest Neighbor Search
Ronen Basri Tal Hassner Lihi Zelnik-Manor Weizmann Institute Caltech
Presentation transcript:

Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing 1998

Problems Nearest neighbor (NN) problem: –Given a set of n points P ={ p 1, …, p n } in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q  X. Approximate nearest neighbor (ANN) problem: –Find a point p  P that is an  –approximate nearest neighbor of the query q in that for all p'  P, d ( p, q )  (1+  ) d ( p', q ).

Motivation The nearest neighbors problem is of major importance to a variety of applications, usually involving similarity searching. –Data compression –Databases and data mining –Information retrieval –Image and video databases –Machine learning –Pattern recognition –Statistics and data analysis Curse of dimensionality –The curse of dimensionality is a term coined by Richard Bellman to describe the problem caused by the exponential increase in volume associated with adding extra dimensions to a (mathematical) space.

Overview of results and techniques These results are obtained by reducing  -NNS to a new problem: point location in equal balls.

nearest neighbor search (NNS)  -nearest neighbor search (NNS) Ring-Cover Trees Point location in equal balls (PLEB)  - Point location in equal balls (PLEB) Locality-Sensitive Hashing Proposition 1Proposition 2 The Bucketing method Proposition 3 Random projections Content

Definitions

Theorems

Constructing Ring-cover trees

Analysis of Ring-cover trees

Definitions

Locality-Sensitive Hashing

The Bucketing method We decompose each ball into a bounded number of cells and store them in a dictionary. The bucketing algorithm works for any l p norm.

J. L. Lemma