Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)

Slides:



Advertisements
Similar presentations
A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.
Advertisements

1 Approximating Edit Distance in Near-Linear Time Alexandr Andoni (MIT) Joint work with Krzysztof Onak (MIT)
Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
Spectral Approaches to Nearest Neighbor Search arXiv: Robert Krauthgamer (Weizmann Institute) Joint with: Amirali Abdullah, Alexandr Andoni, Ravi.
Image acquisition using sparse (pseudo)-random matrices Piotr Indyk MIT.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Efficiently searching for similar images (Kristen Grauman)
Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara.
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT)
Spectral Approaches to Nearest Neighbor Search Alex Andoni Joint work with:Amirali Abdullah Ravi Kannan Robi Krauthgamer.
Coherency Sensitive Hashing (CSH) Simon Korman and Shai Avidan Dept. of Electrical Engineering Tel Aviv University ICCV2011 | 13th International Conference.
Nearest Neighbor Search in High-Dimensional Spaces Alexandr Andoni (Microsoft Research Silicon Valley)
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Dimensionality Reduction
Approximate Nearest Subspace Search with Applications to Pattern Recognition Ronen Basri, Tal Hassner, Lihi Zelnik-Manor presented by Andrew Guillory and.
Data-Powered Algorithms Bernard Chazelle Princeton University Bernard Chazelle Princeton University.
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Spatial and Temporal Data Mining
Efficient Nearest-Neighbor Search in Large Sets of Protein Conformations Fabian Schwarzer Itay Lotan.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)
Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)
Optimal Data-Dependent Hashing for Approximate Near Neighbors
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
Approximate Nearest Subspace Search with applications to pattern recognition Ronen Basri Tal Hassner Lihi Zelnik-Manor Weizmann Institute Caltech.
Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.
Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.
Multimedia and Time-series Data
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Lionel F.
Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Princeton/CCI → MSR SVC) Barriers II August 30, 2010.
Sketching and Nearest Neighbor Search (2) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Manifold learning: MDS and Isomap
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Columbia) Robert Krauthgamer (Weizmann Inst) Ilya Razenshteyn (MIT) 1.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.
Support Vector Machines (SVM)
Data-dependent Hashing for Similarity Search
It is time for a change: Two New Deep Learning Algorithms beyond Backpropagation By Bojan PLOJ.
Fast Dimension Reduction MMDS 2008
Metric Learning for Clustering
Sublinear Algorithmic Tools 3
Lecture 11: Nearest Neighbor Search
Sublinear Algorithmic Tools 2
Dimension reduction techniques for lp (1<p<2), with applications
Spectral Approaches to Nearest Neighbor Search [FOCS 2014]
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Lecture 7: Dynamic sampling Dimension Reduction
Data-Dependent Hashing for Nearest Neighbor Search
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)
Fast Nearest Neighbor Search in the Hamming Space
Advisor: Professor Rafail Ostrovsky
Embedding and Sketching
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Lecture 15: Least Square Regression Metric Embeddings
Approximating Edit Distance in Near-Linear Time
Ronen Basri Tal Hassner Lihi Zelnik-Manor Weizmann Institute Caltech
Sublinear Algorihms for Big Data
Presentation transcript:

Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)

Nearest Neighbor Search (NNS)

Motivation  Generic setup:  Points model objects (e.g. images)  Distance models (dis)similarity measure  Application areas:  machine learning: k-NN rule  speech/image/video/music recognition, vector quantization, bioinformatics, etc…  Distance can be:  Hamming,Euclidean, edit distance, Earth-mover distance, etc…  Primitive for other problems:  find the similar pairs in a set D, clustering…

2D case

High-dimensional case AlgorithmQuery timeSpace Full indexing No indexing – linear scan

Dimension Reduction

Johnson Lindenstrauss Lemma

Idea:

1D embedding

22

Full Dimension Reduction

Concentration

Johnson Lindenstrauss: wrap-up

Faster JL ? 14

Fast JL Transform 15 normalization constant

Fast JLT: sparse projection 16

FJLT: full 17 Projection: sparse matrix “spreading around” Hadamard (Fast Fourier Transform) Diagonal

Spreading around: intuition 18 Projection: sparse matrix Hadamard (Fast Fourier Transform) Diagonal “spreading around”

Spreading around: proof 19

20

21

FJLT: wrap-up 22

Dimension Reduction: beyond Euclidean 23

24

Weak embedding [Indyk’00] 25