FLANN Fast Library for Approximate Nearest Neighbors

Slides:



Advertisements
Similar presentations
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Advertisements

Aggregating local image descriptors into compact codes
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Presented by Xinyu Chang
Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
K Means Clustering , Nearest Cluster and Gaussian Mixture
A NOVEL LOCAL FEATURE DESCRIPTOR FOR IMAGE MATCHING Heng Yang, Qing Wang ICME 2008.
Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.
Instructor: Mircea Nicolescu Lecture 17
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Spatial Indexing I Point Access Methods.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
What is Cluster Analysis?
Cliff Rhyne and Jerry Fu June 5, 2007 Parallel Image Segmenter CSE 262 Spring 2007 Project Final Presentation.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Approximate Nearest Subspace Search with applications to pattern recognition Ronen Basri Tal Hassner Lihi Zelnik-Manor Weizmann Institute Caltech.
Birch: An efficient data clustering method for very large databases
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Project 2 Presentation & Demo Course: Distributed Systems By Pooja Singhal 11/22/
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.
Intelligent Database Systems Lab 1 Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.
Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
CURE: An Efficient Clustering Algorithm for Large Databases Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Stanford University Bell Laboratories Bell Laboratories.
IIIT Hyderabad Scalable Clustering using Multiple GPUs K Wasif Mohiuddin P J Narayanan Center for Visual Information Technology International Institute.
CS654: Digital Image Analysis
Click to edit Master subtitle style 2/23/10 Time and Space Optimization of Document Content Classifiers Dawei Yin, Henry S. Baird, and Chang An Computer.
An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Local Naïve Bayes Nearest Neighbor for image classification Scancho McCann David G.Lowe University of British Columbia 2012 CVPR WonJun Na.
Computer Vision Group Department of Computer Science University of Illinois at Urbana-Champaign.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
School of Computing Clemson University Fall, 2012
Data Driven Resource Allocation for Distributed Learning
Scalable Load-Distance Balancing
Information Retrieval in Practice
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Spark Presentation.
Parallel Density-based Hybrid Clustering
Sameh Shohdy, Yu Su, and Gagan Agrawal
Applying Twister to Scientific Applications
Hybrid Programming with OpenMP and MPI
Nearest Neighbors CSC 576: Data Mining.
Ronen Basri Tal Hassner Lihi Zelnik-Manor Weizmann Institute Caltech
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
Presentation transcript:

FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi Rice University

Outline Applications What we are going to do Introduction What is FLANN? Which programming languages does it support? Applications Approaches Randomized k-d Tree Algorithm The Priority Search K-Means Tree Algorithm Experiments Data Dimensionality Search Precision Automatic Selection of the Optimal Algorithm Scaling Nearest Neighbor Search What we are going to do References

Outline Applications What we are going to do Introduction What is FLANN? Which programming languages does it support? Applications Approaches Randomized k-d Tree Algorithm The Priority Search K-Means Tree Algorithm Experiments Data Dimensionality Search Precision Automatic Selection of the Optimal Algorithm Scaling Nearest Neighbor Search What we are going to do References

What is FLANN? FLANN is a library for performing fast approximate nearest neighbor searches in high dimensional spaces. System for automatically choosing the best algorithm and optimum parameters depending on the dataset.

Which Programming Languages does it support? Written in C++ Contains a binding for: C MATLAB Python

Outline Applications What we are going to do Introduction What is FLANN? Which programming languages does it support? Applications Approaches Randomized k-d Tree Algorithm The Priority Search K-Means Tree Algorithm Experiments Data Dimensionality Search Precision Automatic Selection of the Optimal Algorithm Scaling Nearest Neighbor Search What we are going to do References

Applications Cluster analysis Pattern Recognition Statistical classification Computational Geometry Data compression Database

Outline Applications What we are going to do Introduction What is FLANN? Which programming languages does it support? Applications Approaches Randomized k-d Tree Algorithm The Priority Search K-Means Tree Algorithm Experiments Data Dimensionality Search Precision Automatic Selection of the Optimal Algorithm Scaling Nearest Neighbor Search What we are going to do References

Approaches Multiple Randomized k-d Tree Algorithm Searching multiple trees in parallel Splitting dimension is randomly chosen from top ND dimensions with highest variance ND is fixed to 5 Usually 20 trees is used Best performance in most of data sets

Approaches The priority search K-Means Tree algorithm Partitioning data into K distinct regions Recursively partitioning each zone until the leaf node which has no more than K items Pick up the initial centers using random selection Gonzales’ algorithm I max is number of iterations of making regions Better performance than k-d tree for higher precisions

Approaches Complexity Comparison

Outline Applications What we are going to do Introduction What is FLANN? Which programming languages does it support? Applications Approaches Randomized k-d Tree Algorithm The Priority Search K-Means Tree Algorithm Experiments Data Dimensionality Search Precision Automatic Selection of the Optimal Algorithm Scaling Nearest Neighbor Search What we are going to do References

Experiments Data Dimensionality Has a great impact on the nearest neighbor matching performance The decrease or increase in performance is highly correlated with the type of data For Random data samples it will highly decreases

Experiments Data Dimensionality However for Image Patches and real life data, the performance will increases as dimensionality increases. It can be explained by the fact that each dimension gives us some information about the other dimensions so with few search iterations it’s more likely to find the exact NN

Experiments Search Precision The desired search precision determines the degree of speedup Accepting precision as low as 60% we can achieve a speedup of three orders of magnitude

Experiments Automatic Selection of the Optimal Algorithm Algorithm is a parameter itself Each algorithm can have different performance with different data sets Each algorithm has some internal parameters Dimensionality has a great impact Size and Structure of data (Correlation?) Desired Precision k-means tree & randomized Kd-trees have best performances in most data sets

Experiments How to find the best internal parameters First using Global Grid Search to find a zone on parameter plane in order to achieve better performance Then Local optimizing using Nelder-Mead downhill simplex method Can choose optimizing on all data sets or portion of it Do this for all available algorithms Find the best algorithms with its internal parameters

Outline Applications What we are going to do Introduction What is FLANN? Which programming languages does it support? Applications Approaches Randomized k-d Tree Algorithm The Priority Search K-Means Tree Algorithm Experiments Data Dimensionality Search Precision Automatic Selection of the Optimal Algorithm Scaling Nearest Neighbor Search What we are going to do References

Scaling Nearest Neighbor Search We can achieve better performance using larger scale data sets Problem: NOT possible to load into single memory Solutions: Dimension Reduction Keeping data on a disk and loading into memory (poor performance) Distributing ←

Scaling Nearest Neighbor Search Distribute NN matching among N machines using Map-Reduce like algorithm Each machine will only have to index and search 1/N of the whole data The final result of NNS is obtained by merging the partial results from all the machines in the cluster once they have completed the search Using Message Passing Interface (MPI) specification The query is sent from a client to one of the computers in MPI cluster (Master server) The master server broadcasts the query to all of the processes in the cluster Each process run NNS in parallel on its own fraction of the data When the search is complete an MPI reduce operation is used to merge the results back to master process and the final results is returned to the client

Scaling Nearest Neighbor Search Implementing using Message Passing Interface (MPI)

Outline Applications What we are going to do Introduction What is FLANN? Which programming languages does it support? Applications Approaches Randomized k-d Tree Algorithm The Priority Search K-Means Tree Algorithm Experiments Data Dimensionality Search Precision Automatic Selection of the Optimal Algorithm Scaling Nearest Neighbor Search What we are going to do References

What we are going to do Developing and simulating an approach for pre processing the input queries to get the better performance Try to group the input data so that we do not need to search though all data in the tree Trade off between Throughput and Latency

References [1] Marius Muja and David G. Lowe: "Scalable Nearest Neighbor Algorithms for High Dimensional Data". Pattern Analysis and Machine Intelligence (PAMI), Vol. 36, 2014. [PDF] [BibTeX] [2] Marius Muja and David G. Lowe: "Fast Matching of Binary Features". Conference on Computer and Robot Vision (CRV) 2012. [PDF] [BibTeX] [3] Marius Muja and David G. Lowe, "Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration", in International Conference on Computer Vision Theory and Applications (VISAPP'09), 2009 [PDF] [BibTeX]

Thank you for your attention

Questions ?