An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Aggregating local image descriptors into compact codes

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Fast Algorithms For Hierarchical Range Histogram Constructions

Data Mining Classification: Alternative Techniques

K-means method for Signal Compression: Vector Quantization

1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)

Addressing the Medical Image Annotation Task using visual words representation Uri Avni, Tel Aviv University, Israel Hayit GreenspanTel Aviv University,

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

Efficiency concerns in Privacy Preserving methods Optimization of MASK Shipra Agrawal.

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

The Statistics of Fingerprints A Matching Algorithm to be used in an Investigation into the Reliability of the Use of Fingerprints for Identification Bob.

Cascaded Filtering For Biometric Identification Using Random Projection Atif Iqbal.

Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.

Multimedia Indexing and Retrieval Kowshik Shashank Project Advisor: Dr. C.V. Jawahar.

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

Reduced Support Vector Machine

Chapter 11 Integration Information Instructor: Prof. G. Bebis Represented by Reza Fall 2005.

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.

Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.

Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.

Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.

FLANN Fast Library for Approximate Nearest Neighbors

Birch: An efficient data clustering method for very large databases

IIIT Hyderabad Atif Iqbal and Anoop Namboodiri Cascaded.

Radial-Basis Function Networks

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

1 Fingerprint Classification sections Fingerprint matching using transformation parameter clustering R. Germain et al, IEEE And Fingerprint Identification.

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

SISAP’08 – Approximate Similarity Search in Genomic Sequence Databases using Landmark-Guided Embedding Ahmet Sacan and I. Hakki Toroslu

IMAGE DATABASES Prof. Hyoung-Joo Kim OOPSLA Lab. Computer Engineering Seoul National University.

Visual Information Systems Recognition and Classification.

Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.

Human pose recognition from depth image MS Research Cambridge.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

Frank Bergschneider February 21, 2014 Presented to National Instruments.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

KNN & Naïve Bayes Hongning Wang

Naifan Zhuang, Jun Ye, Kien A. Hua

General-Purpose Learning Machine

Fast nearest neighbor searches in high dimensions Sami Sieranoja

Instance Based Learning

Supervised Time Series Pattern Discovery through Local Importance

Unsupervised Riemannian Clustering of Probability Density Functions

K Nearest Neighbor Classification

Learning with information of features

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Instance Based Learning

Image Classification Painting and handwriting identification

Text Categorization Berlin Chen 2003 Reference:

Topological Signatures For Fast Mobility Analysis

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

Presentation transcript:

An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)

Nearest Neighbor Retrieval Representation of an object – Fixed Length – Variable Length (Dis)similarity Function (Distance Measure) Neighborhood Nearest neighbor retrieval problem can now be formalized as retrieving objects similar to a given object, where similarity is in accordance to a given similarity function

Need of NN retrieval schemes If the search space is small, Sequential search is applied for accurate results. With increase in memory, volume of data stored online has increased. – Sequential search is time consuming – Need to index data for fast retrieval – Birth of NN search algorithms

Computationally Expensive Distance Measures X 1 more similar to X 2 than X 3 visually L1 distance between X 1 and X 2 is more than X 1 and X 3

Time complexity is super-linear to length of input. Edit Distance: O(d 2 ) Chamfer Distance

Nearest Neighbor Classification Computationally expensive distance functions – Nearest neighbor classifiers often impractical for real applications. – Takes over 20 minutes to classify a single object on a modern PC using an optimized C++ implementation. – Larger the available training data, better will be the accuracy but at the cost of high computation time.

Motivation(Approx. NN) Why to find similarity with all samples when decision is based on top K ? How to find top K without finding similarity with all the samples ? Solution: Compute K1 > K Approx. NN and find best K in that list. In order to compute K1-NN, use fewer explicit matches.

Problem Statement Improve nearest neighbor retrieval and classification performance in spaces with computationally expensive distance measures. Generate an expansible approximate nearest neighbor list for a given query. Dealing with points in non-metric space.

Metric Space

Tree based (KD-tree, R-Tree) Hashing Based

KD Tree for Metric Space

Non-Metric Space

Wrist Rotation

Applications Classification based on K-NN – K is determined empirically Identification – Stop when similarity is above a fixed threshold Retrieval Applications – Optimizing network usage in peer-to-peer computer networks. – Content-based retrieval systems Concept of similarity is abstract – Need to generate expansible list – Learn from user feedbacks

Challenges Accuracy depends on the similarity function used to compute Approx. NN List can not be pre-computed, should be generated on-the-fly Should be incremental to support scalability Non-metric space – Prohibit application of triangular inequality

Manifold Theory Given set of high dimensional points, where Find embedding set in a low dimensional space – Preserves local similarity – Assumption: Data is well distributed

Run MDS on similarity matrix to get Embedding of new sample is given by mean of column of squared matrix requires computation of similarity of new sample to points

Related Work FastMap, Random Reference Objects, Random Line Projections, VP-Trees – Finds embedding of the query by computing only a few exact distances – Assumption: Triangular Inequality BoostMap: Used AdaBoost to combine many simple 1-D embedding FastMapLipschitz Embedding

Problem Formulation Goal : Compute approx. NN of a query point q, from a set S of N points in accordance to similarity function, F. Solution : – Split the data into a multi-level hierarchy – Exploit local similarity property to direct the search from top to bottom

Hierarchical Local Maps(HLM)

1. FIND SIMILARITY OF QUERY WITH POINTS AT TOPMOST LEVEL 2. IDENTIFY NEAREST NEIGHBORS3. GET CHILDREN OF NEAREST NEIGHBOR HOW TO FIND SIMILARITY OF QUERY WITH THESE SAMPLES WITHOUT EXPLICITLY CALCULATING IT ? 4. PROJECT POINTS ON THE MANIFOLD 5. PROJECT QUERY ON THE MANIFOLD 6. FIND NEAREST NEIGHBOURS OF QUERY IN THIS METRIC SPACE 7. IDENTIFY THESE POINTS IN THE TREE 8. TRAVERSE DOWN IN SAME FASHION

Results Unipen Handwriting Database – digit examples – Divided into training and testing set with 2:1 ratio – Distance Measure : DTW

Number of DTW Computations for K nearest neighbor retrieval

Classification Accuracy on UNIPEN Dataset using exact and approximate k-NN Classification Accuracy K DTW HLM Difference Average No. of DTW computations done by HLM : 160 Expected No. of DTW computations done in brute-force : 5315

Biometric ( Special Case) High Inter-Class Variation Low Intra-Class Variation Low variation in inter-class distances Indexing for identification How to apply HLM in such cases ?? – Local similarity structure becomes degenerate destroying any manifold structure

Biometric Data

High dimensional data Relative Contrast :

Iris Feature Vector Length 1000 If same class- < 100 bit differ Else equal probability of each bit to match or not match – On average 500 bit differ Imposter Scores would be around 0.5

Biometric Data Such Distribution is bad for Indexing

Softness/Hardness Measure of overlap between genuine and imposter classes Soft Biometric (Face, Body Silhouette) – Poor Classification Accuracy – Better indexing – Correlates to multi dimensional point Hard Biometric (Iris, Fingerprint) – Good Classification Accuracy – Poor indexing – Correlates to high dimensional point Need a balance between two

Dataset CASIA Iris Image Database V3.0 – 855 images corresponding to 285 users in training and testing set – 3 samples per eye Simlarity Function – Hamming Distance – Euclidean Distance (Softer Metric) Average gray value of a block resulted in 160D feature vector Penetration Rate – Percentage of data on which we ran biometric matcher. False Reject Rate – Percentage of identification instances in which false rejection occurs

HLM on CASIA Iris Dataset

Synthetic Dataset Class center sampled from 1D gaussian (0,1) Generate d-dimensional by sampling d times Points of same class sampled from gaussian with mean as class centers and varying variance. Total Number of classes: 500 Points in same class – Training : 10 – Testing : 5

Indexing performance varying number of dimensions

Indexing performance varying within class to between class variance ratio

Contributions A representation scheme for objects in a dataset that allows for fast retrieval of approximate nearest neighbors in non- euclidean space. Search mechanism combined with filter and refine approach is proposed that minimizes the number of exact distance computations for computationally expensive distance measure. Study performance of our scheme on biometric data and study the parameters affecting its performance.

Conclusion and Future Work Local Similarity Property is well exploited by HLM Incremental and Scalable Softer biometric in filtering step combined with hard biometric in refine step would drastically reduce computation time Optimal construction of HLM Defining a measure for similarity function that allows hierarchical representation. Learn a function to find degree of indexibility – Extract parameters from data distribution and similarity function

Thank You Related Publication: Pratyush Bhatt and Anoop Namboodiri “Hierarchical Local Maps for Robust Approximate Nearest Neighbour Computation” Proceedings of the 7th International Conference on Advances in Pattern Recognition (ICAPR 2009), Feb. 4-6, 2009, Kolkatta, India.